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1. REAL PARTY IN INTEREST 



The real party in interest of the above-captioned patent application is PROMEGA 
CORPORATION. 
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2. RELATED APPEALS AND INTERFERENCES 

There is an appeal in commonly assigned, copending application Serial No. 10/314,827 
which may have a bearing on the Board's decision in the present appeal. 
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3. STATUS OF THE CLAIMS 

The present application was filed on August 24, 2000 with 66 claims. 

Claims 1, 45, 47, and 63 were amended, claims 10, 13, 16-17, 22-23, 46, 48-59, and 65- 
66 were canceled, and claims 67-68 were added in the Amendment filed on August 11, 2003. 
Claims 1-2, 14-15, 47, 61-63, and 67-68 were amended, claims 7-8, 19 and 40 were canceled, 
and claims 69-73 were added in the Amendment filed on April 6, 2004. Claims 1 and 67 were 
amended and claims 74-80 were added in the Amendment filed on June 4, 2004. 

Claims 1,4-6, 9, 15, 18, 20-21, 24-37, 42-43, 45, 47, 60, 67, 69-71, 74, 76-78, and 80 
were added, claims 2, 14, 61-63, 68, 72-73, 75, and 79 were canceled, claim 64 was withdrawn, 
and claims 81-82 were added in the Amendment filed on December 13, 2004. Claims 1, 47, 67, 
74, and 78 were amended and claims 83-94 were added in the Amendment filed on September 
22, 2005. Claims 1, 18, 44, 47, 67, 71, 74, 78, 81-85, 90, and 92 were amended, claim 89 was 
canceled, and claims 95-96 were added in the Amendment filed on June 1 9, 2006 and claims 1 8, 
47, 83, 90, and 95-96 were amended in the Amendment filed on February 12, 2007. 

Claims 1, 3-6, 9, 1 1-12, 15, 18, 20-21, 24-39, 41-45, 47, 60, 67, 69-71, 74, 76-78, 80-88, 
and 90-96 are pending and are the subject of this Appeal. 
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4. STATUS OF AMENDMENTS 

The Rule 1 .1 16 Final Amendment filed on February 12, 2007 was entered. 
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5, SUMMARY OF CLAIMED SUBJECT MATTER 

Some aspects of the present invention include but are not limited to synthetic nucleic acid 
molecules for reporter polypeptides that have nucleic acid sequences modified to remove 
transcription regulatory sequences. 



Independent Claim 1 

Claim 1 is directed to a first synthetic nucleic acid molecule comprising at least 300 
nucleotides of a coding region for a reporter polypeptide which has at least 90% amino acid 
sequence identity to a reporter polypeptide encoded by a wild type nucleic acid sequence (claims 
1, 7, and 14, page 3, lines 7-8 and 12-15, page 6, lines 6-7, page 7, lines 15-16, and page 35, line 
25). The wild type nucleic acid sequence encodes chloramphenicol acetyltransferase, Renilla 
luciferase, beetle luciferase, beta-lactamase, beta-glucuronidase or beta-galactosidase (page 32, 
lines 23-30 and page 36, line 29-page 37, line 8). The codon composition of the first synthetic 
nucleic acid molecule is different at more than 25% of the codons from that of the wild type 
nucleic acid sequence and is different than the codon composition of a second synthetic nucleic 
acid molecule which encodes a reporter polypeptide which has at least 90% amino acid sequence 
identity to the reporter polypeptide encoded by the wild type nucleic acid sequence (page 8, lines 
17-30). The codons in the second synthetic nucleic acid molecule that are different than the 
codons in the wild type nucleic acid sequence are mammalian high usage codons selected to 
resuh in the second synthetic nucleic acid molecule having a reduced number of a combination 
of different mammalian transcription factor binding sequences, and optionally a reduced number 
of intron splice sites, poly(A) addition sites or prokaryotic 5' noncoding regulatory sequences 
relative to the wild type nucleic acid sequence (page 3, lines 23-26, page 4, lines 1-5 and 18-19, 
page 7, line 24-page 8, line 3, page 37, lines 17-30, and page 65, line 21 -page 66, line 3). The 
codons which differ in the first synthetic nucleic acid molecule relative to the second synthetic 
nucleic acid molecule are mammalian codons selected to result in the first synthetic nucleic acid 
molecule having a reduced number of a combination of different mammalian transcription factor 
binding sequences, and optionally a reduced number of intron splice sites, poly(A) addition sites 
or prokaryotic 5' noncoding regulatory sequences, that are introduced to the second synthetic 
nucleic acid molecule by selecting the mammalian high usage codons (page 5, lines 23-26, page 
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8, lines 5-30, page 37, lines 16-29, page 51, lines 3-10 and 15-21, and page 51, line 28-page 52, 
line 5). The mammalian transcription factor binding sequences are those present in a database of 
transcription factor binding sequences (page 37, lines 19-21). 



Dependent Claim 90 

Claim 90, which depends on claim 1, is directed to a first synthetic nucleic acid molecule 
where the mammalian transcription factor binding sequences, intron splice sites, poly(A) 
addition sites and prokaryotic 5' noncoding regulatory sequences in the wild type nucleic acid 
sequence or the second synthetic nucleic acid sequence are identified with software (page 38, 
lines 18-24). The identified intron splice sites are selected from AGGTRAGT, AGGTRAG, 
GGTRAGT or YNCAGG, the identified poly(A) addition sites have AATAAA, the identified 
prokaryotic 5' noncoding regulatory sequences are selected from TATAAT, or AGGA or GGAG 
if a methionine codon is within 12 bases 3' of the AGGA or GGAG, and the identified 
mammalian transcription factor binding sequences are in a database of transcription factor 
binding sequences, mutant transcription factor binding sequences and consensus transcription 
factor binding sequences, and identified under parameters that allow for partial ambiguity with 
sequences in the database (page 37, lines 19-21, page 48, lines 18-24, page 48, line 29-page 49, 
line 6 and lines 15-18, and page 49, line 27-page 50, line 10). The codons are selected to reduce 
the number of identified sequences or sites (page 7, line 24-page 8, line 16 and page 48, line 1 7- 
page 52, line 1 1). The first synthetic nucleic acid molecule has fewer mammalian transcription 
factor binding sequences than the second synthetic nucleic acid molecule which has fewer 
mammalian transcription factor binding sequences than the wild type nucleic acid sequence 
(page 7, line 24-page 8, line 16 and page 48, line 17-page 52, line 11). 

Independent Claim 18 

Claim 1 8 is directed to a synthetic nucleic acid molecule comprising SEQ ID N0:7 
(GRver5), SEQ ID N0:8 (GRver6), SEQ ID N0:9 (GRver5,l), or SEQ ID NO:297 (GRver5.1), 
or a nucleic acid molecule which is capable of hybridizing thereto under high stringency 
conditions, or the complement of the hybridizable nucleic acid molecule which encodes a 
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luciferase. SEQ ID N0s:7, 8, 9 and 297 are synthetic nucleotide sequences of the invention 
encoding a "green" click beetle luciferase. 



Independent Claim 47 

Claim 47 is directed to a first polynucleotide which hybridizes under medium stringency 
hybridization conditions to SEQ ID N0:9 (GRverS.l), SEQ ID N0:18 (RD156-1H9), SEQ ID 
NO:297 (GRverS.l), SEQ ID NO:301 (RD156-1H9), or the complement thereof, and comprises 
an open reading frame encoding a beetle luciferase polypeptide which has at least 90% amino 
acid sequence identity to a luciferase having SEQ ID NO:23 encoded by a corresponding wild 
type nucleic acid sequence having SEQ ID N0:1 (claim 47). SEQ ID N0:1 is a nucleotide 
sequence encoding a "yellow-green" click beetle luciferase (LucPplYG) having SEQ ID NO:23. 
SEQ ID N0s:18 and 301 are synthetic nucleotide sequences of the invention encoding a "red" 
click beetle luciferase. The codon composition of the open reading frame of the first 
polynucleotide is different at more than 25% of the codons from that of the wild type luciferase 
nucleic acid sequence and is different than the codon composition of a second polynucleotide 
which encodes a polypeptide which has at least 90% amino acid sequence identity to the 
polypeptide encoded by the wild type nucleic acid sequence (page 8, lines 17-30). The codons in 
the second polynucleotide that are different than the codons in the wild type nucleic acid 
sequence are mammalian high usage codons selected to result in the second polynucleotide 
having a reduced number of a combination of different mammalian transcription factor binding 
sequences, intron splice sites, poly(A) addition sites or prokaryotic 5' noncoding regulatory 
sequences relative to the wild type nucleic acid sequence (page 3, lines 23-26, page 4, lines 1-5 
and 18-19, page 7, line 24-page 8, line 3, page 37, lines 17-30, and page 65, line 21 -page 66, line 
3). The codons which differ in the first polynucleotide relative to the second polynucleotide are 
mammalian codons selected to result in the open reading frame in the first polynucleotide having 
a reduced number of a combination of different mammalian transcription factor binding 
sequences, and optionally a reduced number of intron splice sites, poly(A) addition sites or 
prokaryotic 5' noncoding regulatory sequences, that are introduced to the second polynucleotide 
by selecting the mammalian high usage codons (page 5, lines 23-26, page 8, line 5-30, page 37, 
lines 16-29, page 51, lines 3-10 and 15-21, and page 51, line 28-page 52, line 5). The 
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mammalian transcription factor binding sequences are those present in a database of transcription 
factor binding sequences (page 37, lines 19-21). 



Independent Claim 67 

Claim 67 is directed to a first synthetic nucleic acid molecule comprising at least 300 
nucleotides of a coding region for a luciferase which has at least 90% amino acid sequence 
identity to a reporter polypeptide encoded by a wild type beetle luciferase nucleic acid sequence 
(claims 1 and 9). The codon composition of the first synthetic nucleic acid molecule is different 
at more than 25% of the codons from that of the wild type nucleic acid sequence and is different 
than the codon composition of a second synthetic nucleic acid molecule which encodes a 
luciferase which has at least 90% amino acid sequence identity to the luciferase encoded by the 
wild type nucleic acid sequence (claims 1 and 9, and page 8, lines 17-30). The codons in the 
second synthetic nucleic acid molecule that are different than the codons in the wild type nucleic 
acid sequence are mammalian high usage codons selected to result in the second synthetic 
nucleic acid molecule having a reduced number of a combination of different mammalian 
transcription factor binding sequences, and optionally a reduced number of intron splice sites, 
poly(A) addition sites or prokaryotic 5' noncoding regulatory sequences relative to the wild type 
nucleic acid sequence (page 3, lines 23-26, page 4, lines 1-5 and 18-19, page 7, line 24-page 8, 
line 3, page 37, lines 17-30 and page 65, line 21 -page 66, line 3). The codons which differ in the 
first synthetic nucleic acid molecule relative to the second synthetic nucleic acid molecule are 
mammalian codons selected so as to result in the first synthetic nucleic acid molecule having a 
reduced number of a combination of different mammalian transcription factor binding sequences, 
and optionally a reduced number of intron splice sites, poly(A) addition sites or prokaryotic 5' 
noncoding regulatory sequences, that are introduced to the second synthetic nucleic acid 
molecule by selecting the mammalian high usage codons (page 5, lines 23-26, page 8, lines 5-30, 
page 37, lines 16-29, page 51, lines 3-10 and 15-21 and page 51, line 28-page 52, line 5). The 
mammalian transcription factor binding sequences are those present in a database of transcription 
factor binding sequences (page 37, lines 19-21). 



APPEAL BRIEF UNDER 37 CFR § 41.37 Page 10 or64 

Serial Number: 09/645,706 Okv. 341.005USI 

Filing Date: August 24,2000 

Title: SYNTHETIC NUCLEIC ACID MOLECULE COMPOSITIONS AND METHODS OF PREPARATION 



Dependent Claim 95 

Claim 95, which depends on claim 67, is directed to a first synthetic nucleic acid 
molecule where the mammalian transcription factor binding sequences, intron splice sites, 
poly(A) addition sites and prokaryotic 5' noncoding regulatory sequences in the wild type nucleic 
acid sequence or the second synthetic nucleic acid sequence are identified with software (page 
38, lines 18-24). The identified intron splice sites are selected from AGGTRAGT, AGGTRAG, 
GGTRAGT or YNCAGG, the identified poly(A) addition sites have AATAAA, the identified 
prokaryotic 5* noncoding regulatory sequences are selected from TATAAT, or AGGA or GGAG 
if a methionine codon is within 12 bases 3' of the AGGA or GGAG, and the identified 
mammalian transcription factor binding sequences are in a database of transcription factor 
binding sequences, mutant transcription factor binding sequences and consensus transcription 
factor binding sequences, and identified under parameters that allow for partial ambiguity with 
sequences in the database, and the codons are selected to reduce the number of identified 
sequences or sites (page 37, lines 19-21, page 48, lines 18-24, page 48, line 29-page 49, line 6 
and lines 15-18 and page 48, line 17-page 50, line 10). The first synthetic nucleic acid molecule 
has fewer mammalian transcription factor binding sequences than the second synthetic nucleic 
acid molecule, which has fewer mammalian transcription factor binding sequences than the wild 
type nucleic acid sequence (page 7, lines 24-page 8, line 16 and page 48, line 17-page 52, line 
11). 
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Independent Claim 74 

Claim 74 is directed to a first synthetic nucleic acid molecule comprising at least 300 
nucleotides of a coding region for a luciferase which has at least 90% amino acid sequence 
identity to a luciferase encoded by a parent nucleic acid sequence having SEQ ID N0:2, wherein 
the codon composition of the synthetic nucleic acid molecule is different at more than 25% of the 
codons from that of the parent nucleic acid sequence and is different than the codon composition 
of a second synthetic nucleic acid molecule which encodes a luciferase which has at least 90% 
amino acid sequence identity to the luciferase encoded by the parent nucleic acid sequence 
(claims 1, 7 and 14, and page 3, lines 7-8 and 12-15, page 6, lines 6-7, page 7, lines 15-16 and 
page 35, line 25). SEQ ID N0:2 is a nucleotide sequence encoding a mutant yellow green click 
beetle luciferase (YG#81-6G01) (Example 1). The codons in the second synthetic nucleic acid 
molecule that are different than the codons in the parent nucleic acid sequence are mammalian 
high usage codons selected to result in the second synthetic nucleic acid molecule having a 
reduced number of a combination of different mammalian transcription factor binding sequences, 
and optionally a reduced number of intron splice sites, poly(A) addition sites or prokaryotic 5* 
noncoding regulatory sequences relative to the parent nucleic acid sequence (page 3, lines 23-26, 
page 4, lines 1-5 and 18-19, page 7, line 24-page 8, line 3, page 37, lines 17-30 and page 65, line 
21 -page 66, line 3). The codons which differ in the first synthetic nucleic acid molecule relative 
to the second synthetic nucleic acid molecule are mammalian codons selected to result in the first 
synthetic nucleic acid molecule having a reduced number of a combination of different 
mammalian transcription factor binding sequences, and optionally a reduced number of intron 
splice sites, poly(A) addition sites or or prokaryotic 5' noncoding regulatory sequences, that are 
introduced to the second synthetic nucleic acid molecule by selecting the mammalian high usage 
codons (page 5, lines 23-26, page 8, lines 5-30, page 37, lines 16-29, page 51, lines 3-10 and 15- 
21, and page 51 line 28-page 52, line 5). The mammalian transcription factor binding sequences 
are those present in a database of transcription factor binding sequences (page 37, lines 19-21). 

Dependant Claim 96 

Claim 96, which depends on claim 74, is directed to a first synthetic nucleic acid 
molecule where the mammalian transcription factor binding sequences, intron splice sites, 
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poly(A) addition sites and prokaryotic 5' noncoding regulatory sequences in the parent nucleic 
acid sequence or the second synthetic nucleic acid sequence are identified with software. The 
identified intron splice sites are selected from AGGTRAGT, AGGTRAG, GGTRAGT or 
YNCAGG, the identified poly(A) addition sites have A ATA A A, the identified prokaryotic 5' 
noncoding regulatory sequences are selected from TATAAT^ or AGGA or GGAG if a 
methionine codon is within 12 bases 3' of the AGGA or GGAG, and the identified mammalian 
transcription factor binding sequences are in a database of transcription factor binding sequences, 
mutant transcription factor binding sequences and consensus transcription factor binding 
sequences, and identified under parameters that allow for partial ambiguity with sequences in the 
database, and the codons are selected to reduce the number of identified sequences or sites (page 
37, lines 19-21, page 48, lines 18-24, page 48, line 29-page 49, line 6 and lines 15-18, and page 
49, line 27-page 50, line 10). The first synthetic nucleic acid molecule has fewer mammalian 
transcription factor binding sequences than the second synthetic nucleic acid molecule which has 
fewer mammalian transcription factor binding sequences than the parent nucleic acid sequence 
(page 7, lines 24-page 8, line 16 and page 48, line 17-page 52, line 11), 



Independent Claim 78 

Claim 78 is directed to a first polynucleotide which hybridizes under medium stringency 
hybridization conditions to SEQ ID N0:9 (GRver5.1) or SEQ ID NO:297 (GRver5.1), or the 
complement thereof, and comprises an open reading frame encoding a luciferase polypeptide 
which has at least 90% amino acid sequence identity to a luciferase encoded by a parent nucleic 
acid sequence having SEQ IDN0:2 (claims 1, 7, 14, and 47, and page 3, lines 7-8 and 12-15, 
page 6, lines 6-7, page 7, lines 15-16 and page 35, line 25, and Example 1). The codon 
composition of the open reading frame of the first polynucleotide is different at more than 25% 
of the codons from that of the parent nucleic acid sequence and is different than the codon 
composition of a second polynucleotide which encodes a polypeptide which has at least 90% 
amino acid sequence identity to the luciferase encoded by the parent nucleic acid sequence 
(claims 1, 7, and 14, and page 3, lines 7-8 and 12-15, page 6, lines 6-7, page 7, lines 15-16 and 
page 35, line 25). The codons in the second polynucleotide that are different than the codons in 
the parent nucleic acid sequence are mammalian high usage codons selected to result in the 
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second polynucleotide having a reduced number of a combination of different mammalian 
transcription factor binding sequences, and optionally a reduced number of intron splice sites, 
poly(A) addition sites or prokaryotic 5' noncoding regulatory sequences relative to the parent 
nucleic acid sequence (page 3, lines 23-26, page 4, lines 1-5 and 18-19, page 7, line 24-page 8, 
line 3, page 37, lines 17-30 and page 65, line 21 -page 66, line 3). The codons which differ in the 
first polynucleotide relative to the second polynucleotide are mammalian codons selected to 
result in the first polynucleotide having a reduced number of a combination of different 
mammalian transcription factor binding sequences, and optionally a reduced number of intron 
splice sites, poly(A) addition sites or prokaryotic 5' noncoding regulatory sequences that are 
introduced to the second polynucleotide by selecting the mammalian high usage codons (page 5, 
lines 23-26, page 8, lines 5-30, page 37, lines 16-29, page 51, lines 3-10 and 15-21, and page 51, 
line 28-page 52, line 5). The mammalian transcription factor binding sequences are those 
present in a database of transcription factor binding sequences (page 7, lines 24-page 8, line 16 
and page 48, line 17-page 52, line 1 1). 



Independent Claim 83 

Claim 83 is directed to a first polynucleotide which hybridizes under high stringency 
hybridization conditions to SEQ ID N0:9 (GRver5.1), SEQ ID N0:18 (RD156-1H9), SEQ ID 
NO:297 (GRver5.1), SEQ ID NO:301 (RD156-1H9), or the complement thereof, and comprises 
an open reading frame encoding a luciferase polypeptide which has at least 90% amino acid 
sequence identity to a beetle luciferase having SEQ ID NO:23 encoded by a corresponding wild 
type nucleic acid sequence (claim 47). The codon composition of the open reading frame of the 
first polynucleotide is different at more than 25% of the codons from that of the wild type 
nucleic acid sequence (page 8, lines 17-30). 



Independent Claim 84 

Claim 84 is directed to a first polynucleotide which hybridizes under high stringency 
hybridization conditions to SEQ ID N0:9 (GRver5.1) or SEQ ID NO:297 (GRver5.1), or the 
complement thereof, and comprises an open reading frame encoding a luciferase polypeptide 
which has at least 90% amino acid sequence identity to a polypeptide encoded by a parent 
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nucleic acid sequence having SEQ ID N0:2 (claim 47). The codon composition of the open 
reading frame of the first polynucleotide is different at more than 25% of the codons from that of 
the parent nucleic acid sequence (page 8, lines 17-30). 



Independent Claim 91 

Claim 91 is directed to a first synthetic nucleic acid molecule comprising at least 300 
nucleotides of a coding region for a reporter polypeptide which has at least 90% amino acid 
sequence identity to a reporter polypeptide encoded by a wild type nucleic acid sequence, 
wherein the codon composition of the first synthetic nucleic acid molecule is different at more 
than 25% of the codons from that of the wild type nucleic acid sequence (claims 1, 7, and 14, and 
page 3, lines 7-8 and 12-15, page 6, lines 6-7, page 7, lines 15-16, and page 35, line 25). The 
codons in the first synthetic nucleic acid molecule that are different than the codons in the wild 
type nucleic acid sequence are mammalian high usage codons selected to result in the first 
synthetic nucleic acid molecule having a reduced number of known mammalian transcription 
factor binding sequences (page 3, lines 23-26, page 4, lines 1-5 and 18-19, page 7, line 24-page 
8, line 3, page 37, lines 17-30, and page 65, line 21 -page 66, line 3). 



Independent Claim 92 

Claim 92 is directed to a first synthetic nucleic acid molecule comprising at least 300 
nucleotides of a coding region for a reporter polypeptide which has at least 90% amino acid 
sequence identity to a reporter polypeptide encoded by a wild type nucleic acid sequence (claims 
1, 7, and 14, and page 3, lines 7-8 and 12-15, page 6, lines 6-7, page 7, lines 15-16, and page 35, 
line 25). The first synthetic nucleic acid molecule is prepared by replacing codons in the wild 
type nucleic acid molecule with mammalian high usage codons, yielding a second synthetic 
nucleic acid molecule, and replacing codons in the second synthefic nucleic acid molecule with 
mammalian codons selected to reduce the number of a combination of different, known 
mammalian transcripfion factor binding sites, yielding the first synthetic nucleic acid molecule 
(claims 1, 7, and 14, and page 3, lines 7-8 and 12-15, page 6, lines 6-7, page 7, lines 11-18, and 
page 35, line 25). The codon composition of the first synthetic nucleic acid molecule is di fferent 
at more than 25% of the codons from that of the wild type nucleic acid sequence (page 8, lines 
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16-30). The wild type nucleic acid sequence encodes chloramphenicol acetyltransferase, Renilla 
luciferase, beetle luciferase, beta-lactamase, beta-glucuronidase or beta-galactosidase (page 32, 
lines 23-30 and page 36, line 29-page 37, line 8). 

This summary does not provide an exhaustive or exclusive view of the present subject 
matter, and Appellant refers to the appended claims and its legal equivalents for a complete 
statement of the invention. 
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6. GROUNDS OF REJECTION TO BE REVIEWED ON APPEAL 

The 35 U.S.C. § 1 12. Second Paragraph, Rejections to be Reviewed 

Whether claims 1, 3-6, 9, 1 1-12, 15, 20-21, 24-39, 41-45, 47, 60, 67, 69-71, 74, 76-78, 
80-82, 85-88, and 90-96 are unpatentable under 35 U.S.C. § 1 12, second paragraph. 

Whether claim 90, which depends on claim 1, is unpatentable under 35 U.S.C. § 1 12, 
second paragraph. 

The 35 U.S.C. $ 1 12. First Paragraph. "Enablement" Rejection to be Reviewed 

Whether claims 1, 3-6, 9, 11-12, 15, 20-21, 24-33, 35-39, 41-45, 47, 60, 67, 69-70, 81-82, 
86-88, and 90-95 lack enablement under 35 U.S.C. § 112, first paragraph. 

The 35 U.S.C. $ 103(a) Rejections to be Reviewed 

Whether claims 1,3-6, 9, 11-12, 15,20-21,24-39,41-45,60, 67, 69-70, 81, 86, and 90- 
95 are unpatentable under 35 U.S.C. § 103(a) over Sherf et al. (U.S. Patent No. 5,670,356) in 
view of Zolotukhin et al. (U.S. Patent No. 5,874,304), Donnelly et al. (WO 97/47358), Pan et al. 
(Nucl. Acids Res. . 27:1094 (1999)), Cornelissen et al. (U.S. Patent No. 5,952,547), and Hey et 
al. (U.S. Patent No. 6,169,232). 

Whether claim 95, which depends on claim 67, is unpatentable under 35 U.S.C. § 103 (a) 
in view of Sherf et al., Zolotukhin et al., Donnelly et al., Pan et al., Cornelissen et al., and Hey et 
al. 

Whether claims 18, 47, 71, 74, 76-78, 80, 82-85, 87-88, and 96 are unpatentable under 35 
U.S.C. § 103(a) over Sherf et al., in view of Zolotukhin et al., Donnelly et al., Pan et a!., 
Cornelissen et al., and Hey et al., and further in view of Wood et al. (WO 99/14336). 

Whether claim 96, which depends on claim 74, is unpatentable under 35 U.S.C. § 103(a) 
over Sherf et al., Zolotukhin et al., Donnelly et al., Pan et al. Cornelissen et a!.. Hey et al., and 
further in view of Wood et al. 



Page 16 of 64 
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7. ARGUMENT 

I. The 35 U.S.C. § 112, Second Paragraph, Rejection 

A) The Applicable Law under 35 U.S.C. §112, Second Paragraph 

In rejecting a claim under the second paragraph of 35 U.S.C. § 1 12, it is incumbent on the 
Examiner to establish that one of ordinary skill in the pertinent art, when reading the claims in 
light of the supporting specification, would not have been able to ascertain with a reasonable 
degree of precision and particularity the particular area set out and circumscribed by the claims. 
Ex parte Wu . 10 U.S.P.Q.2d 2031, 2033 (B.P.A.L 1989) (citing In re Moore . 439 F.2d 1232, 169 
U.S.P.Q. 236 (C.C.P.A. 1971); In re Hammack . 427 F.2d 1378, 166 U.S.P.Q. 204 (C.C.P.A. 
1970)). 

The M.P.E.P. adopts this line of reasoning: 

whether the claims set out and circumscribe a particular subject matter with a 
reasonable degree of clarity and particularity. Defmiteness of claim language 
must be analyzed, not in a vacuum, but in light of: (A) The content of the 
particular application disclosure; (B) The teachings of the prior art; and (C) The 
claim interpretation that would be given by one possessing the ordinary level of 
skill in the pertinent art at the time the invention was made. In reviewing a claim 
for compliance with 35 U.S.C. 1 12, second paragraph, the examiner must 
consider the whole claim to determine whether the claim apprises one of ordinary 
skill in the art of its scope and, therefore, serves the notice function required by 35 
U.S.C. 1 12, second paragraph, by providing clear warning to others as to what 
constitutes infringement of the patent. 

M.P.E.P. § 2173.02 (emphasis added). 

Moreover, if the language is as precise as the subject matter permits, the courts can demand no 
more. Shatterproof Glass Corp. v. Libbey-Owens Ford Co. . 758 F.2d 613, 624, 225 U.S.P.Q. 
634, 641 (Fed. Cir. 1985). cert, dismissed , 474 U.S. 976 (1985) (quoting Georgia-Pacific Corp. 
V. United States Plywood Corp. . 258 F.2d 124, 136, 118 U.S.P.Q. 122, 132 (2d Cir.), cert 
denied, 358 U.S. 884, 119 U.S.P.Q. 501 (1958)). 
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B) The Examiner's Position 

The Examiner rejected claims 1,3-6,11-12,15,20-21, 24-39, 4 1 -45, 47, 60, 67, 69-7 1 , 
74, 76-78, 80-82, 85-88, and 90-96 under 35 U.S.C. § 1 12, second paragraph, as indefinite for 
the recitation of "a reduced number of a combination of mammalian transcription factor binding 
sequences, intron splice sites, poly(A) addition sites and/or prokaryotic 5' noncoding regulatory 
sequences", "wherein the mammalian transcription factor binding sequences are present in a 
database of transcription factor binding sequences" and "known mammalian transcription factor 
binding sequences." In particular, the Examiner asserts that the phrases at issue define a group 
of sequences related by function and the art does not define what sequences are included in the 
group, and so it would not be possible to quantify the number of such sequences. 

C) Appellant \s Position 

It is Appellant's position that those skilled in the art, even in the absence of Appellant's 
specification, understand the metes and bounds of the phrases: transcription factor binding 
sequences (TFBS), intron splice sites, poly(A) addition sites, and prokaryotic 5' noncoding 
regulatory sequences, as they are conventionally used and understood by the art. See, e.g., U.S. 
Patent No. 5,670,356 ("transcripfion factor binding sites"), Donnelly et ai. (WO 97/47358) 
("intron splice sites"), lannaconne et ai.. Plant MoL BioL . 34:485 (1997) ("polyA sequences"). 
Pan et al., Nucl. Acids Res. , 27:1094 (1999) ("prokaryotic promoters," "poly(A) signals," and 
"exon-intron boundaries"), Faisst and Meyer, Nucl. Acids Res. , 20:3 (1992) (which discloses a 
compilation of vertebrate encoded transcription factors ). Mount, Am. J. Hum, Genet. , 67:788 
(2000) (consensus and other conserved splice sites), Jensen et al., AppL Environ. Microbiol. , 
64:82 (1998) (synthetic promoters with known consensus sequences, see abstract), Hsieh et al., J. 
Bacteriol. , 177 :5740 (1995) ("a potential ribosome-binding site", see abstract and Figure 3; also 
see page 5742 and Figure 3 for conserved promoter sequence motifs "-10" and "-35"), and 
Andrews et al., J. Virol. , 67:7705 (1993) ("a canonical poly(A) consensus signal"; see abstract) 
(see Evidence Appendix). Moreover, the Examiner has acknowledged that those terms are 
conventional in the art (page 3 of the Office Action mailed September 13, 2006; see the 
Evidence Appendix). 
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Even assuming, for the sake of argument, the metes and bounds of the phrases 
TFBS, intron splice sites, poly(A) addition sites, and prokaryotic 5' noncoding regulatory 
sequences were not readily recognizable to the art worker, Appellant's specification 
discloses that a transcription regulatory element or a transcription regulatory sequence is 
a genetic element that controls some aspect of the expression of nucleic acid sequence(s), 
and includes a promoter, transcription factor binding sites, splicing signals, 
polyadenylation signals, termination signals, and enhancer elements (page 23, lines 24- 
30). 

Promoters and enhancers are disclosed as typically including short arrays of DNA 
sequences that interact specifically with cellular proteins involved in transcription (page 
24, lines 2-4). A "poly(A) sequence" is disclosed as a DNA sequence associated with the 
termination and polyadenylation of a nascent RNA transcript (page 25, lines 10-12). 
Splicing signals are disclosed as mediating the removal of introns from the primary RNA 
transcript and consist of a splice donor and acceptor site (page 24, lines 1-2). 

Moreover, the specification discloses that TFBS, intron splice sites, poly(A) 
addition sites, and prokaryotic 5* noncoding regulatory sequences can be identified using 
databases and software, e.g., databases and software such as TRANSFAC, TESS, EPD, 
NNPD REBASE, GenePro, MAR and BCM GeneFinder (page 38, lines 20-23). 
Particular TFBS, intron splice sites, poly(A) addition sites, and prokaryotic 5' noncoding 
regulatory sequences are shown at page 48, lines 18-24, page 49, lines 3-6 and 17, and 
page 50, lines 23-26 of the specification. 

In addition, with regard to the phrases "wherein the mammalian transcription 
factor binding sequences are those present in a database of transcription factor binding 
sequences" and "known mammalian transcription factor binding sequences," one of skill 
in the art is aware of databases having transcription factor binding sequences, e.g., see 
page 38 and the Examples in the specification, and the Rule 132 Declaration filed on June 
19, 2006 and executed by Monika Wood, a co-inventor of the above-referenced 
application (Evidence Appendix), or is aware of other sources of mammalian 
transcription factor binding sites (e.g., Faisst and Meyer, Nucl. Acids Res. , 20:3 (1992); 
Evidence Appendix). 
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Further, each of the recited classes of sequences or sites has a definite property that is 
recognizable (and testable) by one of skill in the art. For instance, a sequence can be tested for 
whether it can terminate transcription and initiate poly(A) polyadenylation at the end of a RNA 
transcript (a poly(A) addition site); whether it can direct transcription of a gene (promoter); 
whether it can signal where a primary RNA transcript is to be spliced to form a mRNA (splice 
site); or whether it binds a transcription factor. There is nothing intrinsically wrong in using 
functional language, defining something by what it does rather than by what it is, in drafting 
patent claims; courts have even recognized the practical necessity for the use of functional 
language. In re Swinehart . 169.U.S.P.Q. 226, 228 (C.C.P.A. 1971). 

In particular, claim 90 recites that the identified intron splice sites are selected from 
AGGTRAGT, AGGTRAG, GGTRAGT or YNCAGG, the identified poly(A) addition sites have 
AATAAA, the identified prokaryotic 5' noncoding regulatory sequences are selected from 
TATAAT, or AGGA or GGAG if a methionine codon is within 12 bases 3' of the AGGA or 
GGAG, and the identified mammalian transcription factor binding sequences are in a database of 
transcription factor binding sequences, mutant transcription factor binding sequences and 
consensus transcription factor binding sequences, and identified under parameters that allow for 
partial ambiguity with sequences in the database. 

Therefore, one of skill in the art in the absence of Appellant's specification or 
alternatively one of skill in the art in possession of Appellant's specification, would understand 
the metes and bounds of the phrase "mammalian transcription factor binding sequences," "intron 
splice sites," "poly(A) addition sites", "prokaryotic 5' noncoding regulatory sequences" , 
"wherein the mammalian transcription factor binding sequences are those present in a database 
of transcription factor binding sequences" and "known mammalian transcription factor binding 
sequences" in the claims. 

With regard to calculating the number of mammalian transcription factor binding 
sequences, intron splice sites, poly(A) addition sites and prokaryotic 5' noncoding regulatory 
sequences, since those sequences can be identified, as discussed above, the number present in a 
polynucleotide can likewise be calculated. 

The Board is requested to consider that Example 1 in Appellant's specification discloses 
that synthetic click beetle luciferase sequences were prepared that had a reduced number of a 
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combination of mammalian transcription factor binding sequences, as well as intron splice sites, 
poly(A) addition sites and prokaryotic 5' noncoding regulatory sequences. In particular, it is 
disclosed that mammalian codon replacement in a parent click beetle luciferase sequence 
(YG#81-6G01) yielded a mammalian codon optimized click beetle luciferase sequence 
(GRverl). Removal of intron splice sites, poly(A) addition sites and prokaryotic 5* noncoding 
regulatory sequences, e.g., promoter sequences, in the mammalian codon optimized click beetle 
luciferase sequence by codon replacement, resulted in a sequence, GRver2, that had about 100 
mammalian transcription factor binding sequences. Replacement of codons in GRver2 to 
remove those mammalian transcription factor binding sequences, and intron splice sites, poly(A) 
addition sites and prokaryotic 5' noncoding regulatory sequences, yielded a sequence, GRver3, 
that had about 50 newly introduced mammalian transcription factor binding sequences. 
Replacement of codons in GRver3 to remove those mammalian transcription factor binding 
sequences, and intron splice sites, poly(A) addition sites and prokaryotic 5' noncoding regulatory 
sequences, yielded a sequence, GRver4, that had about 20 newly introduced mammalian 
transcription factor binding sequences. Those newly introduced mammalian transcription factor 
binding sequences were removed by codon replacement to yield GRverS. 

Moreover, as described in the Rule 132 Declaration filed on June 19, 2006, using 
software and a database that are available to the public and comparable to those disclosed in the 
application, Ms. Wood determined the number of mammalian transcription factor binding 
sequences in /wc+, a sequence described in Sherf et al. (U.S. Patent No. 5,670,356), a reference 
cited against the claims under 35 U.S.C. § 103(a). 

Accordingly, the calculation of the number of transcription factor binding sequences, 
intron splice sites, poly(A) addition sites and prokaryotic 5' noncoding regulatory sequences 
(prokaryotic promoter sequences hereinafter) in a sequence is possible and can be determined by 
one of skill in the art. 

Regarding the Examiner's alleged change in scope of claims which recite mammalian 
transcription factor binding sequences, and optionally intron splice sites, poly(A) addition sites 
and promoter sequences, it is Appellant's position that intron splice sites, poly(A) addition sites, 
and prokaryotic promoter sequences represent relatively conserved sequences that were well 
known prior to Appellant's effective filing date (see. Mount, supra; Jensen et al., supra; Hsieh et 



APPEAL BRIEF UNDER 37 CFR § 41.37 Piige 22 0164 

Serial Number: 09/645,706 Dkt: 341.005US1 

riling Date: August 24. 2000 

Title: SYNTHETIC NUCLEIC ACID MOLECULE COMPOSITIONS AND METHODS OF PREPARATION 

al., supra; and Andrews et al., supra). And although there may be new members added to the 
group "mammalian transcription factors" over time, the independent claims in the present 
application provide that the synthetic reporter nucleic acid molecules have a reduced number of a 
combination of different mammalian transcription factor binding sequences, as a result of codon 
replacement of at least 25% of the codons of a wild type or parent reporter nucleic acid sequence, 
with mammalian codons including mammalian high usage codons. 

The reduced number of mammalian transcription factor binding sequences, and 
optionally intron splice sites, poly(A) addition sites and promoter sequences in Appellant's 
synthetic nucleic acid molecules is relative to a corresponding parent or wild type nucleic acid 
sequence. Wild type nucleic acid sequence for chloramphenicol acetyltransferase, Renilla 
luciferase, beetle luciferase, beta-lactamase, beta-glucuronidase or beta-galactosidase were 
known to the art prior to Appellant's filing (Wood et ah, Science . 244:700 (1989), Ye et al., 
Biochem. Biophy. Acta . 1339 :39 (1997), Figure 2 in Murray et ah, J. MoL Biol. . 254:993 (1995), 
see Figure 4 in Zhang et al., Proc. Natl. Acad. Sci. USA . 94:4504 (1997), Lorenz et al., Proc. 
Natl. Acad. Sci. USA . 88:4438 (1991), and see references 1 and 21 in Sirot et al. ( Antimicr. 
Agents Chemo. , 41:1322 (1997)). Moreover, the parent reporter nucleic acid sequence recited in 
claims 1 and 67 is a wild type nucleic acid sequence encoding chloramphenicol acetyltransferase, 
Renilla luciferase, beetle luciferase, beta-lactamase, beta-glucuronidase or beta-galactosidase, 
and the parent reporter nucleic acid sequence recited in claims 47, 74 and 78 has a specified wild 
type nucleic acid sequence (claim 47; SEQ ID N0:1) or specified parent nucleic acid sequence 
(claims 74 and 78; SEQ ID N0:2). 

Thus, Appellant's synthetic nucleic acid molecules are readily recognized by one of skill 
in the art. That is because they are reporter encoding nucleotide sequences with at least 25% of 
codons replaced with mammalian codons relative to a corresponding wild type (or parent) 
reporter nucleic acid sequence, and with a reduction in a combination of different mammalian 
transcription factor binding sequences, as well as optionally intron splice sites, poly(A) addition 
sites and prokaryotic 5' noncoding regulatory sequences, resulting from codon replacement. The 
presence of at least 25% mammalian codons including mammalian codons which are not "high 
usage" mammalian codons, and the reduction in a combination of different mammalian 
transcription factor binding sequences, and optionally intron splice sites, poly(A) addition sites 
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and prokaryotic 5' noncoding regulatory sequences, result in the synthetic nucleic acid molecules 
of the invention being significantly divergent in nucleotide sequence relative to the 
corresponding wild type or parent reporter nucleic acid sequence. 

Therefore, the claims meet the requirements of 35 U.S.C. § 1 12, second paragraph. 

IL The 35 U.S.C. § 112, First Paragraph, "Enablement*' Rejection 



A) The Applicable Law under 35 U.S.C. § 112, First Paragraph (Enablement) 

The specification shall contain a written description of the invention and of the manner 
and process of making and using it, in such full, clear, concise, and exact terms as to enable any 
person skilled in the art to which it pertains... to make and use the same, and shall set forth the 
best mode contemplated by the inventor of carrying out his invenfion. 35 U.S.C. § 1 12(1). 

It is well-settled that it is not necessary that a patent applicant have prepared and tested 
all the embodiments of his invention in order to meet the requirements of § 1 12. In re Angstadt , 
190 U.S.P.Q. 214, 218 (C.C.P.A. 1976). Furthermore, enablement is not precluded by the 
necessity for some experimentation, such as routine screening. The key word is "undue" not 
"experimentation." in re Angstadt , 190 U.S.P.Q. 214, 219 (C.C.P.A. 1976). In fact, a 
considerable amount of experimentation is permissible if it is merely routine, or the specification 
provides a reasonable amount of guidance with respect to the direction in which the 
experimentation should take. Ex parte Jackson . 217 U.S.P.Q. 804, 807 (Bd. App. 1982). 

B) The Examiner 's Position 

The Examiner rejected claims 1,3-6, 9, 1 1-12, 15, 20-21, 24-33, 35-39, 41-45, 57, 60, 67, 
69-70, 81-82, 86-88, and 90-95 under 35 U.S.C. § 1 12, first paragraph. The Examiner asserts 
that the specification does not reasonably provide enablement for any variant DNA molecules 
encoding any reporter polypepfide having at least 90% identity to a wild type reporter 
polypeptide, having more than 25% of the codons altered, and having a reduced number of 
transcription factor binding sequences, intron splice sites, poly(A) addition sites and 5' 
noncoding regulatory sequences than a mammalian codon optimized version of the parent 
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nucleic acid or to any nucleic acid which will hybridize to SEQ ID N0:9 under medium 
stringency conditions. 

Specifically, the Examiner asserts that each of the groups of reporters that are beetle 
luciferases, beta-glucuronidases, chloramphenicol acetyltransferases and beta-lactamases, 
includes vast numbers of proteins which are not well characterized and often substantially 
different from those taught in the art. The Examiner further asserts that it is not routine in the art 
to screen for multiple substitutions or multiple modifications in proteins such as reporter 
proteins, and that the number of modifications encompasses many sequences, not all of which 
are active, and so it would require undue experimentation to make and test those modified 
sequences. 



C) Appellant 's Position 

One skilled in the art, having read Appellant's specification, would know how to make 
and use one or more synthetic nucleic acid molecules encoding a chloramphenicol 
acetyl transferase, beetle or Renilla luciferase, beta-lactamase, beta-glucuronidase or beta- 
galactosidase that may not be identical in amino acid sequence, but has at least 90% identity, to a 
reporter polypeptide encoded by a wild type or parent nucleic acid sequence for chloramphenicol 
acetyltransferase, beetle or Renilla luciferase, beta-lactamase, beta-glucuronidase or beta- 
galactosidase. 

First, prior to the filing date of the present application, nucleotide sequences encoding, 
and amino acid sequences for, various reporter polypeptides were known (see, e.g., Bouthors et 
al., Protein Eng, . 12:3 1 3 (1 999); Lorenz et al. r Proc. Natl Acad. Sci. USA . 88:4438 ( 1 99 1 ); 
Matsumura et al. f Nat. Biotech, . 17:696 (1999); Murray et al. ( J. Mol. BioL . 254:993 (1995); 
Sirot et al. ( Antimicr. Agents Chemo. , 41:1322 (1997); Voladri et al. ( J. Bacteriol. . 178:7248 
(1996)), Zhang et al. ( Proc. Natl. Acad. Sci. USA . 94:4504 (1997); and Wood ( Science . 244:700 
(1989); see Evidence Appendix). Applicant need not teach what is known to the art. Hybritech. 
Inc. V. Monoclonal Antibodies. Inc. . 231 U.S.P.Q. 81, 94-95 (Fed. Cir. 1986). 

Moreover, prior to the filing date of the present application, the relative frequency of 
codons employed in different organisms was known (see Aota et al., Nucl. Acids Res. . 16:3 1 5 
(1988), "Codon Usage Tabulated from GenBank Genetic Sequence Data"; Murray et al., Nucl. 
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Acids Res. , 17:477 (1989), "Codon Usage in Plants"; Wada et al., Nucl. Acids Res. , 18:2367 
(1990) "Codon Usage Tabulated from GenBank Genetic Sequence Data"; Sharp et al., Nucl. 
Acids Res. , 6:8207 (1988), "Codon Usage Patterns in Escheria coli. Bacillus subtihis, 
Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosiohpilia melanogaster and Homo 
Sapiens; A Review of the Considerable Within Species Diversity"; and Sharp et al., Nucl. Acids 
Res., 15:1281 (1987), "The Codon Adaptation Index - A Measure of Directional Synonymous 
Codon Usage Bias, and its Potential Applications"; in Evidence Appendix). The specification 
also discloses codons used more frequently in human cells (page 4, lines 24-36) and codons used 
more frequently in plant cells (page 7, lines 1-10). Codon replacement in particular sequences is 
also described in U.S. Patent No. 5,670,356, Donnelly et al. (WO 97/47358), and Pan et al. 
( Nucl. Acids Res. . 27:1094 (1999)) (see Evidence Appendix). 

Further, prior to Appellant's filing, TFBS, intron splice sites, poly(A) addition sites and 
prokaryotic 5' noncoding regulatory sequences were known. See Faisst and Meyer, supra, 
Mount, supra; Jensen et al., supra; Hsieh et al., supra; and Andrews et al., supra. Moreover, it 
was within the skill of the art to test whether a particular sequence binds transcription factors, is 
a splice donor or splice acceptor, is a poly(A) addition site, or initiates transcription in a 
prokaryotic system. 

Appellant's specification describes altering the structure of a parent reporter nucleic acid 
sequence by iterative codon replacement to reduce TFBS, intron splice sites, poly(A) addition 
sites, and prokaryotic 5' noncoding regulatory sequences. Therefore, the specification, in view of 
the knowledge of the art worker at the time of Appellant's filing, enables synthetic reporter 
polypeptides of the invention regardless of the source of the parent reporter nucleic acid 
sequence, e.g., whether the parent nucleic acid sequence is a wild type or variant nucleic acid 
sequence. 

Thus, a reporter polypeptide encoded by a synthetic nucleic acid molecule of the 
invention may include codons that result in amino acid substitutions (the synthetic nucleic acid 
sequence encodes a polypeptide with at least 90% amino acid sequence identity to a 
corresponding wild type reporter polypeptide). Appellant has provided evidence that it is well 
within the skill of the art to introduce substitutions into various reporter proteins and yield a 
variant protein with a detectable activity , e.g., an activity of the corresponding wild type reporter 
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protein (see, e.g., Stapleton et al., Antimicrob. Agents Chemother. , 43:1881 (1999); Bouthors et 
al., Protein Eng. , 12:313 (1999); Voladri et al., J. Bacteriol. , 178:7248 (1996); Murray et al., J, 
Mol. Biol. , 254:993 (1995); and Matsumura et al., Nat. BiotechnoL , 17:696 (1999); see Evidence 
Appendix). 

In particular, with regard to luciferases, numerous substitutions have been identified in or 
introduced into beetle luciferases without affecting the reporter property of the substitution 
variants (see, e.g., Kajiyama et al., Protein Engineering , 4:691 (1991)), Wood et al., J. Biolumin. , 
4:31 (1989), Wood et al., J. Biolumin. , 5:107 (1990) and Sala-Newby et al., Biochem. J. , 
279:727 (1991)), U.S. Patent Nos. 5,670,356, and 6,602,677 (see Evidence Appendix). Note that 
LucPpyYG (SEQ ID NO:23), a wild type sequence, and YG#81-6G01 (SEQ ID NO:24) have 
over 95% amino acid sequence identity to each other and both function as reporters. Further, in 
U.S. Patent No. 6,602,677, five mutant luciferases are disclosed that have 12, 21, 32, 37 and 37 
substitutions , respectively, relative to a parent luciferase, and function as reporters . Also, the 
Board is requested to note that in Example 1 of the above-referenced application, the amino acid 
sequence of the click beetle luciferases encoded by synthetic nucleic acid sequences of the 
invention is different than the amino acid sequence of the parent click beetle luciferase. GRver2- 
GRver5, and GRver5.1 have 1 amino acid substitution (related to a substitution associated with 
green light) relative to parent sequence YG#81-6G01; RDver2-RDver5 and RDverS.l have 4 
amino acid substitutions (related to substitutions associated with red light) relative to YG#8 1 - 
6G01; RDver5.2 has 5 amino acid substitutions (related to substitutions associated with red light 
and improved spectral properties) relative to YG#81-6G01; and RD156-1H9 has 9 amino acid 
substitutions (related to substitutions associated with red light, improved spectral properties and 
improved luminescence intensity) relative to YG#81-6G01 (see Figure 3 for a comparison of the 
amino acid sequences encoded by the synthetic click beetle luciferase sequences). Similarly, in 
Example 3 of the present application, the amino acid sequence of the Renilla luciferase encoded 
by a synthetic nucleic acid sequence is different than the amino acid sequence of the parent 
Renilla luciferase sequence. 

Moreover, the primary amino acid sequences of firefly luciferases and click beetle 
luciferases have common features, and so can be aligned (see Figure 8 in Wood et al., J. Biolum. 
Chemi. , 4:289 (1989), and Figure 3 in Wood et al., Science , 244:700 (1989); see Evidence 
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Appendix). Such an alignment, in view of positions that have been substituted in beetle 
luciferases, provides direction on what residues may be substituted without altering beetle 
luciferase reporter properties. 

Thus, it is well within the skill of the art worker to predictably substitute amino acids in a 
reporter protein, e.g., substitute up to at least 10% of the residues in a luciferase, and yield a 
variant protein with detectable activity. 

With regard to the Examiner's assertion that it is not routine to screen for multiple 
substitutions or multiple modifications, the Board is requested to consider WO 99/14336, Zhang 
et al., supra, and Arnold ( Chem. Eng. Sci. . 51:5091 (1996) (see Evidence Appendix). WO 
99/14336 discloses the use of recursive mutagenesis to prepare thermostable beetle luciferases, 
and the identification of clones with certain properties including luciferase activity. The 
identified clones had amino acid substitutions. Zhang et al. disclose the use of directed evolution 
coupled with selection to convert a beta-galactosidase to a beta-fucosidase. After iterative cycles 
of DNA shuffling and the evolved fucosidase screening for fucosidase activity (see Figure 13) 
gene encoded 8 substitutions. Arnold discloses the introduction of multiple modifications into a 
nucleic acid molecule and screening for particular phenotype(s) of the encoded gene product. 

In response to the undue experimentation alleged to be necessary to prepare variant 
reporters and screen those reporters for activity, the fact that the outcome of such a screening 
program may be unpredictable is precisely why a screening program is carried out. It simply 
cannot reasonably be contended that a program to locate biomolecules with target biological or 
physical properties would not be carried out by the art because the results cannot be predicted in 
advance. 

In fact, the Federal Circuit has explicitly recognized that the need, and methodologies 

required, to carry out extensive synthesis and screening programs to locate biomolecules with 

particular properties do not constitute undue experimentation. In re Wands , 8 U.S. P. Q. 2d 1400, 

1 406- 1 407 (Fed. Cir. 1 988), the Court stated: 

The nature of monoclonal antibody technology is that it involves screening 
hybridomas to determine which ones secrete antibody with desired characteristics. 
Practitioners of this art are prepared to screen negative hybridomas in order to 
find one that makes the desired antibody. 
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Likewise, practitioners in the art related to the present application would be well-equipped to 
prepare and/or screen variant reporter constructs to identify those with reporter activity. See 
also, Hvbritech Inc. v. Monoclonal Antibodies Inc. . 231 U.S.P.Q. 81, 84 (Fed. Cir. 1986) 
(evidence that screening methods used to identify characteristics [of monoclonal antibodies] 
were available to art convincing of enablement). Thus, the fact that a given claim may 
encompass a variety of molecules is not dispositive of the enablement issue, particularly in an art 
area in which the level of skill is very high and in which screening of large numbers of 
compounds has been standard practice for at least ten years ( Ex parte Forman , 230 U.S.P.Q.2d 
456 (Bd. App. 1986). 

At page 7 in the Advisory Action date March 1, 2007, the Examiner responds that 
"Applicants are not claiming the screening methods they are claiming the results of the screening 
methods which their own argument admits are unpredictable." However, the claims at issue in In 
re Wands and Hvbritech were not screening claims, but rather methods of using antibodies with 
particular properties. Further, while the outcome of a particular screening program may be 
unpredictable, the Federal Circuit recognized that those programs do not constitute undue 
experimentation. In the present application, the Board is requested to consider that screening 
libraries of molecules based on a parent nucleic acid molecule encoding chloramphenicol 
acetyltransferase, Renilla luciferase, beetle luciferase, beta-lactamase, beta-glucuronidase or 
beta-galactosidase for mutated nucleic acid encoding a chloramphenicol acetyltransferase, 
Renilla luciferase, beetle luciferase, beta-lactamase, beta-glucuronidase or beta-galactosidase, 
irrespective of whether the parent nucleic acid molecule has a wild type or variant sequence or 
whether the parent protein has a wild type or variant sequence, does not constitute undue 
experimentation, as the property to be detected ("desired characteristics") is predictable. 

Claim 67 is directed to a first synthetic nucleic acid molecule comprising at least 300 
nucleotides of a coding region for a beetle luciferase which has at least 90% amino acid sequence 
identity to a beetle luciferase encoded by a wild type nucleic acid sequence. The codon 
composition of the first synthetic nucleic acid molecule is different at more than 25% of the 
codons from that of the wild type nucleic acid sequence and is different than the codon 
composition of a second synthetic nucleic acid molecule which encodes a beetle luciferase which 
has at least 90% amino acid sequence identity to the beetle luciferase encoded by the wild type 
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nucleic acid sequence. The codons in the second synthetic nucleic acid molecule that are 
different than the codons in the wild type nucleic acid sequence are mammalian high usage 
codons selected to result in the second synthetic nucleic acid molecule having a reduced number 
of a combination of different mammalian transcription factor binding sequences, and optionally a 
reduced number of intron splice sites, poly(A) addition sites or prokaryotic 5' noncoding 
regulatory sequences relative to the wild type nucleic acid sequence. The codons which differ in 
the first synthetic nucleic acid molecule relative to the second synthetic nucleic acid molecule are 
mammalian codons selected to result in the first synthetic nucleic acid molecule having a 
reduced number of a combination of different mammalian transcription factor binding sequences, 
and optionally a reduced number of intron splice sites, poly(A) addition sites or prokaryotic 5' 
noncoding regulatory sequences, that are introduced to the second synthetic nucleic acid 
molecule by selecting the mammalian high usage codons, wherein the mammalian transcription 
factor binding sequences are those present in a database of transcription factor binding 
sequences. 

Thus, even if, assuming for the sake of argument, there are vast numbers of beta- 
glucuronidases, chloramphenicol acetyltransferases and beta-lactamases, Appellant's 
specification, in view of the knowledge and skill of the art worker, enables the beetle luci ferases 
recited in claim 67. 

With respect to claim 47, which is directed to polynucleotides which hybridize to the 
complement of a particular synthetic nucleic acid molecule of the invention, it is Appellant's 
position that one of skill in the art in possession of Appellant's specification is readily able to 
determine whether a nucleic acid molecule hybridizes under medium stringency conditions to 
Appellant's synthetic polynucleotides, e.g., hybridizes to the complement of SEQ ID N0:9, and 
encodes a polypeptide with at least 90% amino acid sequence identity to SEQ ID NO:23. 
Exemplary hybridization conditions are provided at pages 20-21 in the specification. 

In addition, if Appellant's specification enables a nucleic acid molecule that hybridizes 
under high stringency conditions to Appellant's synthetic nucleic acid molecule (note claims 71, 
78-80, and 83-85 are not rejected as lacking enablement under § 1 12(1)), it is logical to conclude 
that it is within the skill of the art worker to determine whether the nucleic acid molecule also 
hybridizes under medium stringency conditions to Appellant's synthefic nucleic acid molecules. 
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Therefore, Appellant's specification fulfills the requirement of 35 U.S.C. § 112, first 
paragraph. 

III. The 35 U.S.C, § 103(a) Rejections 

A) The Applicable Law under 35 U,S.C. § J 03(a) 

The determination of obviousness under 35 U.S.C. § 103 is a legal conclusion based on 
factual evidence. See Princeton Biochemicals, Inc. v. Beckman Coulter, Inc., 41 1 F.3d 1332, 
1336-37 (Fed. Cir. 2005). The legal conclusion, that a claim is obvious within § 103(a), depends 
on at least four underlying factual issues set forth in Graham v. John Deere Co. of Kansas City, 
383 U.S. 1, 17, 86 S.Ct. 684, 15 L.Ed.2d 545 (1966): (1) the scope and content of the prior art; 
(2) differences between the prior art and the claims at issue; (3) the level of ordinary skill in the 
pertinent art; and (4) evaluation of any relevant secondary considerations. 

The Examiner has the burden under 35 U.S.C. § 103 to establish a prima facie case of 
obviousness. In re Fine , 837 F.2d 1071, 5 U.S.P.Q.2d 1596 (Fed. Cir. 1988). As part of 
establishing a prima facie case of obviousness, the Examiner must show that some objective 
teaching in the prior art or some knowledge generally available to one of ordinary skill in the art 
would lead an individual to combine the relevant teaching of the references, id. 

The M.P.E.P. contains explicit direction that agrees with the court's holding in In re Fine : 

In order for the Examiner to establish a prima facie case of obviousness, three 
base criteria must be met. First, there must be some suggestion or motivation, 
either in the references themselves or in the knowledge generally available to one 
of ordinary skill in the art, to modify the reference or to combine reference 
teachings. Second, there must be a reasonable expectation of success. Finally, 
the prior art reference (or references when combined) must teach or suggest all 
the claim limitations. The teaching or suggestion to make the claimed 
combination and the reasonable expectation of success must both be found in the 
prior art, and not based on applicant's disclosure. M.P.E.P. § 2142 (citing In re 
VaceK 947 F.2d 488, 20 U.S.P.Q.2d 1438 (Fed. Cir. 1991)). 

Moreover, the Examiner must provide specific, objective evidence of record for a finding 
of a suggestion or motivation to combine reference teachings and must explain the reasoning by 
which the evidence is deemed to support such a finding. In re Sang Su Lee. 277 F,3d 1 338, 61 
U.S.P.Q.2d 1430 (Fed. Cir. 2002). Further, when making an obviousness rejection based on a 
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combination, there must be some motivation, suggestion or teaching of the desirability of making 
the specific combination made by Applicant. Id. Finally, the Examiner must avoid hindsight. In 
re Bond . 910 F.2d 831, 834, 15 U.S.P.Q.2d 1566, 1568 (Fed. Cir. 1990), reh'e denied , 1990 U.S. 
App LEXIS 1997 (Fed. Cir. 1990). 



B) The Examiner's Position 

The Examiner rejected claims 1, 3-6, 9, 1 1-12, 15, 20-21, 24-39, 41-45, 60, 67, 69-70, 81, 
86, and 89-95 under 35 U.S.C. § 103(a) as being unpatentable over Sherf et al. (U.S. Patent No. 
5,670,356) in view of Zolotukhin et al. (U.S. Patent No. 5,874,304), Donnelly et al. (WO 
97/47358), Pan et al. ( Nucl. Acids Res. , 27:1094 (1999)), Cornelissen et al. (U.S. Patent No. 
5,952,547), and Hey et al. (U.S. Patent No. 6,169,232). In particular, the Examiner asserts that 
each of the references is drawn to methods of increasing expression in a desired host by altering 
the sequence of the nucleic acid but not the encoded protein in a variety of ways that will lead to 
increases in the protein, that the cited references show that the art was clearly aware that a 
combination of changes can be used to accomplish this goal, and that the art clearly teaches all of 
the claimed modifications and combinations of them with one or more of the others. 

The Examiner also rejected claims 18, 47, 71, 74, 76-78, 80, 82-85, 87-88 and 96, under 
35 U.S.C. § 103(a) as being unpatentable over Sherf et al., Donnelly et al., Zolotukhin et al., Pan 
et al., Cornelissen et al., Hey et al., and further in view of Wood et al. (WO 99/14336). 



C) Appellant's Position 

1. The Reiection of Claims K 3-6. 9. 1 1-12. 15. 20-2 K 24-39, 41-45. 60. 67, 69-70. 8K 
86. and 89-94 

a. Discussion of the Cited Art 

In order to minimize potential biological interferences that may complicate the 
interpretation of reporter data, Sherf et al. developed an optimal cytoplasmic form of luciferase 
(column 6, lines 7-9). Sherf et al. disclose a synthetic firefly luciferase gene (luc'^) the sequence 
of which was altered primarily to remove the peroxisomal translocation sequence so as to yield a 
cytoplasmic form of the enzyme (abstract; column 2, lines 59-61). Other alterations were the 
removal of 3 internal palindromic sequences, 5 restriction endonuclease sites, 2 glycosylation 
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sites, and 5 to 7 transcription factor binding sites that were present in the unmodified sequence . 
Also codons were altered at sequences specified in Table 2 to codons preferred ("more 
common") in mammalian cells, relative to a wild type firefly luciferase gene (luc). Of the twenty 
6 to 30 bp regions which were modified, 6 regions included modifications with a dual purpose, 
i.e., one region was modified to eliminate a glycosylation site and a transcription factor binding 
site that was present in the unmodified sequence , three regions were modified to eliminate a 
transcription factor binding site that was present in the unmodified sequence and improve codon 
usage, one region was modified to eliminate two transcription factor binding sites (but not 
improve codon usage) that were present in the unmodified sequence , and another region was 
modified to improve codon usage and eliminate a restriction endonuclease recognition site. 

Thus, the alterations disclosed in Sherf et al. may alter post-translation steps 
(glycosylation), the translation product (i.e., lack of peroxisomal translocation sequence), RNA 
secondary structure (palindromes, and possibly RNA sequences corresponding to restriction 
endonuclease sites), transcription (TFBS), or sequences unrelated to transcription or translation 
(restriction endonuclease sites). 

Sherf et al. also disclose that a vector encoding Luc"^ or Luc was introduced to four 
mammalian cell lines. NIH3T3 and HeLa cells transfected with luc'^ DNA had significantly 
higher levels of luciferase activity relative to NIH3T3 and HeLa cells transfected with luc DNA 
(Table 3), while CHO and CV-1 cells transfected with luc'^ or luc DNA had comparable 
luciferase activity. However, it is unclear what alterations in luc'^ DNA increased luciferase 
activity in mammalian cells, and why those alterations did not uniformly increase luciferase 
activity in all the tested mammalian cells. 

In contrast, a synthetic Renilla luciferase gene of the present invention was expressed at 
significantly higher levels relative to a wild type Renilla luciferase gene in NIH3T3, HeLa, CHO 
and CV-1 cells (Table 10). 

Sherf et al. do not teach or suggest that modification of a parent sequence to remove 
palindromic sequences, restriction endonuclease sites, glycosylation sites, and transcription 
factor binding sites may introduce other undesirable sequences. Nor do Sherf et al. disclose or 
suggest replacing at least 25% of the codons in a parent sequence with selected mammalian 
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codons, to reduce transcription factor binding sequences introduced to the parent sequence by 
high usage mammalian codons. 

A humanized version of a green fluorescent protein (GFP) gene is disclosed in 
Zolotukhin et al. in which 88/238 of the codons in the gene were altered (column 13, lines 1-4). 
It is disclosed that codons were altered in order to address the poor translation efficiency of the 
gfp nucleic acid sequence in human cells (column 12, lines 49-51), as an alternative method to 
increase expression by, for instance, insertion of an intron (column 43, lines 26-32) or the 
introduction of a Kozak sequence, although insertion of a Kozak sequence did not significantly 
change expression (column 43, lines 49-60). Zolotukhin et al. do not disclose or suggest that 
codon optimization of a parent sequence may introduce undesirable sequences. 

Donnelly et al. disclose the preparation of synthetic hepatitis C virus (HCV) genes for 
DNA vaccines. In particular, it is disclosed that codons in the corresponding wild type gene that 
are not the most commonly employed in humans, are replaced with an optimal codon. Of note, 
HCV is a pathogen of humans (page 2 of Donnelly et al.) and so due to evolutionary selection, 
HCV sequences are likely at least partially human codon "optimized." 

Donnelly et al. also disclose that if a CG is created by that codon replacement, i.e., the 
third nucleotide in the replaced codon is C and the first nucleotide in the adjacent codon is G, 
Donnelly et al. disclose that a different codon is selected based on Table 5 in Lathe et al. ( J. Mol. 
Biol. , 183 :1 (1985)) (page 17). Once all codon replacements are made, it is disclosed that the 
codon optimized gene is inspected for undesired sequences such as ATTTA sequences, 
inadvertent creation of intron splice sites, and unwanted restriction enzyme sites, which are then 
eliminated by substituting codons (pages 17-18). 

The bias away from CG residues during codon optimization in Donnelly et al. would 
reduce overall CG content in the final synthetic sequence unless codon substitution to remove 
undesired sequences resulted in an increase in CG dinucleotides in adjacent codons (thus 
defeating the reasoning behind avoiding CGs in adjacent codons). In that regard, note that the 
synthetic click beetle and Renilla luciferase genes described in the Examples had increased CG 
content relative to the respective parent sequence. 

Donnelly et al. provide no details of the sequence of any undesirable sites including 
intron splice sites which are to be eliminated or how to substitute codons to remove ATTTA 
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sequences, splicing sites and restriction enzyme sites. Further, there is no recognition in 
Donnelly et al. that codon optimization may introduce transcription factor binding sequences or 
that transcription factor binding sequences may be removed from sequences. 

To address the poor expression of the merozoite surface protein- 1 {msp-\) gene of 
Plasmodium falciparum in heterologous systems (the wild type sequence has a high A/T content 
that prevented stable cloning in E. coli and expression in heterologous systems; abstract), Pan et 
al. describe a synthetic msp-\ gene. Note that the life cycle of P, falciparum includes humans 
and mosquitoes (page 1094). Thus, due to evolutionary selection, P. falciparum sequences are 
likely at least partially human codon "optimized." 

The synthetic gene in Pan et al. was prepared by first back translating the corresponding 
wild type gene using random (not preferred) human codon replacement. One master sequence 
was chosen with an average codon composition found in human coding sequences (page 1095), 
and then the master synthetic sequence was modified via alternate codon replacement to 
eliminate sequences that might be detrimental to efficient transcription and translation, i.e., 
introduce endonuclease cleavage sites to position sites at or near major processing sites for msp- 
1, or remove prokaryotic promoters, poly(A) signals, exon-intron boundaries, prokaryotic factor- 
independent RNA polymerase terminators, inverted repeats (undesirable secondary structures), 
and long runs of purines (transcription termination) (page 1095). 

Notably, Pan et al. did not seek to eliminate transcription factor binding sequences in 
msp-\ and did not recognize that codon optimization may introduce transcription factor binding 
sequences. Nor does Pan et al. disclose the sequences for prokaryotic promoters, poly(A) 
signals, or exon-intron boundaries that are required to be identified for removal. 

In the Background section of Cornelissen et al, it is disclosed that wholesale 
(nonselective) changes in codon usage can introduce cryptic regulatory signals in a gene, thereby 
causing problems in one or more of transcriptional control, RNA processing control, RNA 
transport control, mRNA degradation control, translational control and protein activity control, 
which in turn inhibits or interferes with transcription and/or translation (column 3, lines 8-25). 

Nevertheless, Cornelissen et al. disclose altering DNA encoding a Bt crystal protein 
(insect protein) for improved expression in plants by introducing translational ly neutral 
modification(s) in cryptic promoter(s), which can direct site-specific transcription initiation in 
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plant cells, and/or abortive intron(s), that inhibit or prevent transcription, nuclear accumulation 
of RNA or nuclear export of RNA (abstract; column 3, line 56-column 4, line 9; column 5, lines 
38-45). Those modifications are the introduction of introns or replacement of codons with others 
encoding the same amino acid (column 3, line 56-column 4, line 9). In particular, it is disclosed 
that a Bt gene is modified by changing sequences with A and T to sequences with G and C that 
encode the same amino acids (column 8, lines 46-57; column 10, lines 14-40; column 17, lines 
27-30), preferably by changing only a few nucleotides and without introducing plant preferred 
codons (column 10, lines 21-29). That is, to accomplish providing a translationally neutral gene, 
less than 10% of nucleotides in a coding region are changed from A/T to G/C (column 10, lines 
16-18) and "[i]nstead of modifying the codon usage of one or more inhibitory zones" in a Bt 
gene, sequence elements are inactivated, preferably by introducing an intron (column 1 1, lines 
31-35). 

The Examples in Cornelissen et al. disclose methods to identify cryptic promoters and 
introns, and the modification of the bt884 and cryl Ab22 sequences (Examples 3-4). bt884 has a 
3' deletion relative to bt2 (bt2 is a wild-type gene , see Hofte et al., Eur. J. Biochem. , 161 :273 
(1986), cited at column 17, line 29), and cryIAb22 has a 5' deletion relative to bt884 (see SEQ 
IDNos. 22-23). 

Cornelissen et al. do not disclose or suggest modifying reporter sequences, e.g., luciferase 
sequences, or recognize that codon replacement may introduce undesirable sites. Nor do 
Cornelissen et al. disclose or suggest replacing codons in a gene with mammalian codons. 

Hey et al. disclose altering codons in storage proteins to yield sink protein nucleic acid 
sequences that have Trp codons for Phe codons ( to increase the nutritional value of seed ) and 
also have a reduction in splice sites, polyA sequences, RNA polymerase termination signals, TA 
and CG doublets and blocks of G or C residues of more than about 4 residues (column 2, lines 
53-67 and column 6, lines 22-25). It is disclosed that a sink protein sequence was back 
translated, codons preferred in maize introduced (column 10, lines 55-66), and restriction 
enzyme sites, splice sites, polyA sequences, RNA polymerase termination signals, TA and CG 
doublets and blocks of G or C residues of more than about 4 residues replaced with second or 
third choice codons (column 1 1, lines 2-15). 
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Hey et al. do not disclose or suggest modifying reporter sequences, or recognize that 
mammalian codon replacement may introduce mammalian transcription factor binding sites that, 
in turn, may be removed. 

b. The Examiner has failed to Make Out a Prima facie Case of Obviousness 
The combination of references does not disclose or suggest Appellant's invention for the 
following reasons. 

i. Comelissen et aL Teach Away from the Claimed Invention and Therefore Cannot 
be Combined with the Other References' Teachings 

Cornelissen et al. teach away from the claimed invention because Cornelissen et al. teach 
that very few modifications in the coding region are required, and that codon replacement with 
codons used more frequently in a particular host cell is not needed, to substantially alleviate 
expression problems. Comelissen et al. went so far as to suggest that because only a relatively 
small number of modifications result in a substantial increase of foreign gene expression in 
plants, the modified genes produced in accordance with their invention are unlikely to contain 
newly introduced sequences that interfere themselves with expression of the Rene in a plant cell 
environment (column 13, lines 37-42). 

As such, it would be improper to combine Cornelissen et al. with the remaining 
references because Cornelissen et al. teach away from the combination. M.P.E.P. 2145 ("It is 
improper to combine references where the references teach away from their combination."); In re 
Grasselli . 713 F.2d 731, 743, 218 U.S.P.Q. 769, 779 (Fed. Cir. 1983). 

ii. Impermissible Hindsight Cannot be Used to Combine the Teachings of Sherf 
et al. with the Other References 

The combination of references does not disclose or suggest Appellant's invention as each 
reference discloses a different way to modify the coding sequence of a different gene to increase 
expression, i.e., viral genes, a gene from a parasite associated with malaria, an insect toxin gene, 
a storage protein gene, or a reporter gene. That is, Zolotukhin et al. disclose codon modification 
generally throughout a green fluorescent protein gene to codons employed more frequently in 
one organism, and Sherf et al. disclose limited and targeted modification (modifications in 20 
regions of 6 to 30 bp) of a wild type firefly luciferase sequence to introduce or remove cloning 
sites, alter insect codons to mammalian codons, and to remove post-translation modification 
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sites, secondary structure, and transcription factor binding sites. Cornelissen et al. disclose 
targeted modification of a toxin gene (modify less than 10% of the nucleotides) to alter Bacillus 
codons to remove sequences that may alter elongation efficiency, e.g., by replacing A and T 
sequences with G and C sequences, and Donnelly et al. describe codon replacement to more 
commonly employed codons combined with further codon substitution to remove CG residues in 
adjacent codons, and subsequent inspection for ATTTA sequences, intron splice sites, and 
unwanted restriction enzyme sites. Hey et al. describe codon replacement in storage protein 
genes to maize codons and reducing restriction enzyme sites, splice sites, polyA sequences, RNA 
polymerase termination signals, TA and CG doublets, and blocks of G or C residues. Pan et al. 
disclose random human codon replacement yielding a population of synthetic sequences with 
codon substitutions, choosing one master synthetic sequence, and then modifying the master 
synthetic sequence via alternate codon replacement to eliminate sequences that might be 
detrimental to efficient transcription and translation, i.e., endonuclease cleavage sites, 
prokaryotic promoters, poly(A) signals, exon-intron boundaries, prokaryotic factor-independent 
RNA polymerase terminators, inverted repeats, and long runs of purines. 

Thus, while there is a general teaching in the combination of cited documents to alter 
codons and/or remove certain undesired sequences in a selected sequence, none of the cited 
documents teaches or suggests that codon alterations, optionally in conjunction with removal of 
other disclosed sequences, i.e., ATTTA sequences, splice sites, endonuclease cleavage sites, 
prokaryotic promoters, poly(A) signals, prokaryotic factor-independent RNA polymerase 
terminators, inverted repeats, long runs of purines, RNA polymerase termination signals, TA and 
CG doublets and blocks of G or C residues of more than about 4 residues, may create TFBS . 
The Examiner has acknowledged that none of the cited documents explicitly teaches that codon 
replacements may create unwanted TFBS (page 13 of the Office Action dated September 13, 
2006). Moreover, none of the cited documents discloses or suggests iterative removal of TFBS 
from a codon altered gene of any type. 

And although one of skill in the art in possession of the cited documents may be 
motivated to alter the codons of a particular sequence, there is no direction in the combination of 
cited documents which guides one of skill in the art to Appellant's invention. It is only with 
hindsight, i.e., given Appellant's disclosure as a ''road map ", that one of skill in the art, picking 
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and choosinR from the cited documents, may be directed to Appellant's invention. That is, with 
regard to synthetic reporter encoding polynucleotides (claims 1, 3-6, 9, 11-12, 15, 20-21, 24-39, 
41-45, 60, 67, 69-70, 81, 86, and 90-94), one of skill in the art in possession of the cited art, 
would be required to modify a reporter gene (Sherf et al. and Zolotukhin et a!.), rather than a 
non-reporter gene (Donnelly et al, Cornelissen et al.. Hey et al., and Pan et al.), by codon 
replacement over at least 25% of the open reading frame (Zolotukhin et al., Donnelly et al., Hey 
et al. and Pan et al.) rather than by alterations in a limited portion of an open reading frame 
(Sherf et al. and Cornelissen et al.), with subsequent additional directed alterations to the 
nucleotide sequence to remove undesired sequences introduced by human to optimal human, 
random to maize, or random human to other human, codon replacements (Donnelly et al., Hey et 
al. and Pan et al., respectively) rather than a lack of substantive subsequent additional directed 
alterations to the nucleotide sequence to remove undesired sequences introduced by codon 
replacement or concurrently with other alterations (Sherf et al., Zolotukhin et a!., and 
Cornelissen et al.). In some instances, codons are further replaced to remove ATTTA sequences, 
splice sites, endonuclease cleavage sites, prokaryotic promoters, poly(A) signals, prokaryotic 
factor-independent RNA polymerase terminators, inverted repeats, long runs of purines, RNA 
polymerase termination signals, TA and CG doublets and blocks of G or C residues of more than 
about 4 residues (Donnelly et a!., Pan et al. and Hey et al.) 

Moreover, the problem in the art (improved expression of genes in heterologous systems) 
has been "solved" by each of the cited documents (in different ways) and so one of skill in the art 
would not look to combining the references in a particular way in the absence of Appellant's 
disclosure. 

At best, the cited documents may suggest modifying a reporter gene over a large portion 
of the open reading frame with a view to generally remove undesired sequences introduced by 
codon replacement with preferred mammalian codons, and then with other mammalian codons. 
But the claims at issue are not directed to such a modified reporter gene. 

Assuming, for the sake of argument, that the cited documents may provide the motivation 
to repeat the alterations disclosed therein in a different gene, or make additional alterations in the 
same gene, as there is no teaching or suggestion of Appellant's invention in the cited documents 
taken alone or in combination, the cited documents do not provide the motivation to arrive at 
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Appellant's invention. That is because none of the cited documents recognizes that replacement 
of nonmammalian codons in a parent nucleotide sequence with mammalian codons introduces 
mammalian transcription factor binding sites not found in the parent nucleotide sequence. 
Moreover, none of the cited documents suggests that a polynucleotide that is modified by 
replacement of nonmammalian codons with mammalian codons be further modified by 
replacement with other, lower usage mammalian codons to reduce the number of introduced 
mammalian transcription factor binding sites. 

The Board is requested to consider that after codon optimization in conjunction with 
removal of non-transcription factor binding sites in click beetle and Renilla luciferase nucleotide 
sequences, Appellant identified about 100 and about 60 transcription factor binding sequences, 
respectively. Further codon replacement to remove those sequences yielded synthetic click 
beetle and Renilla luciferase sequences with about 50 and about 20 new transcription factor 
binding sites , respectively, i.e., they were introduced by codon replacement (Examples 1 and 3). 
The vast majority of the introduced sequences were subsequently removed. 

Moreover, one of skill in the art in possession of the cited documents would be required 
to select and identity at least transcription factor binding sites (Sherf et al. and possibly 
Cornelissen et al.), promoter sequences (Pan et al.), splice sites (Donnelly et al.. Hey et al., and 
Pan et al.), and polyA sites (Hey et al. and Pan et al.) as sequences for removal by codon 
replacement, although the cited art would not lead the art worker to identify this specific 
combination of sites for alteration. Rather, Sherf et al. teach removal of peroxisomal targeting 
sequences, internal palindromic sequences, restriction endonuclease sites, glycosylation sites, 
and transcription factor binding sites, Donnelly et al. discloses removing ATTTA sequences, 
inadvertent creation of intron splice sites, and unwanted restriction enzyme sites, Hey et al. 
disclose removing splice sites, polyA sequences, RNA polymerase termination signals, TA and 
CG doublets, and blocks of G or C residues, and Pan et al. disclose removing endonuclease 
cleavage sites, prokaryotic promoters, poly(A) signals, exon-intron boundaries, prokaryotic 
factor-independent RNA polymerase terminators, inverted repeats, and long runs of purines. 
Cornelissen et al. modify Bt sequences by changing A and T nucleotides to G and C nucleotides. 
Zolotukhin et al. do not even mention removal of a set of specific regulatory sequences in a 
nonnative codon modified coding region. 
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Further, none of the cited documents discloses or suggests the use of software to identify 
mammalian transcription factor binding sequences in a database of transcription factor binding 
sequences (claim 95). 

In addition, one of ordinary skill in the art in possession of the cited art would have no 
reasonable expectation that any particular set of changes may improve activity or be otherwise 
desirable in a gene that is to be expressed in a highly evolutionarily distinct cell. For instance, an 
increase in codon substitutions and a decrease in RNA destabilization sequences in a synthetic 
gene do not necessarily improve the transcriptional characteristics of the synthetic gene relative 
to a reference gene, as codons dictate amino acids to insert during translation and RNA 
destabilization sequences destabilize transcribed RNA sequences, i.e., a post-transcription 
process. In addition, it is unclear what changes to HCV genes (Donnelly et al.), msp-\ gene (Pan 
et al.) or luc (Sherf et al.) sequence result in improved activity in a heterologous host and why 
replacement of codons in luc with codons preferred in mammals and other alterations which 
resulted in luc^ did not improve luciferase activity in all mammalian cells which expressed Luc"^. 

The Examiner asserts that while it is true that none of the cited documents explicitly 
teach that codon replacements may create unwanted transcription factor binding sequences not 
present in the wild type sequence, Hey et al., Donnelly et al., and Pan et al. all show that the art 
recognized that codon modifications can introduce sequences which are unwanted within the 
synthetic gene and that additional codon modifications can decrease the introduction of those 
sequences, and that Sherf et al. clearly teach that the presence of transcription factor binding 
sequences within a reporter gene is an unwanted feature as it may interfere with the desired 
genetic neutrality of the reporter gene. The Examiner also asserts that it is obvious on its face 
that the more changes one makes, the higher the chances that such a detrimental sequence will be 
introduced, and that the remaining art clearly would have motivated one of skill in the art to 
make more substantial changes in codon preference within the luciferase of Sherf et al. 

The Examiner's assertions are contradictory. Why would one of skill in the art make 
more changes that increase the chances that a detrimental sequence is introduced? Moreover, the 
template genes in Donnelly et al. and Pan et al. were already at least partially optimized for 
expression in a desired host, e.g., humans, as HCV and P. falciparum replicate and/or express 
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genes in humans. Further, Hey et al. teach changing the function of the protein encoded by the 
maize codon modified sequences. 

In addition, it is likely relatively straightforward to remove a functional ATTTA 
sequence, splice site, restriction enzyme site, prokaryotic promoter sequence, poly(A) signal, 
RNA polymerase termination signals, prokaryotic factor-independent RNA polymerase 
terminator sequence or inverted repeat, or remove long runs of purines, TA and CG doublets, and 
blocks of G or C residues of more than about 4 residues (sequences disclosed as desirable to alter 
in Donnelly et al.. Pan et al., and Hey et al.). In particular, perhaps only a single nucleotide 
replacement in a codon which forms part of a ATTTA sequence, intron splice site, restriction 
enzyme site, prokaryotic promoter, poly(A) signal, RNA polymerase termination signals, 
prokaryotic factor-independent RNA polymerase terminator, inverted repeat, long run of purines, 
TA and CG doublet, or block of G or C residues, without reference to adjacent sequences , may 
accomplish the removal of those undesired sequences. 

In contrast, to remove a plurality of transcription factor binding sites, optionally in 
conjunction with other classes of sequences, by replacing codons, those modifications are 
selected in context , i.e., with reference to how those modifications impact adjacent sequences, 
iii. The Examiner has used the Incorrect "Obvious-to-Try" Standard 

In making the obviousness rejection, the Examiner clearly relies upon the discredited 
"obvious-to-try" standard. In re OTarrelL 7 U.S.P.Q.2d 1673 (Fed. Cir. 1988), outlines when an 
invention is obvious, and therefore unpatentable, versus when an invention is obvious-to-try, and 
therefore patentable. The Court noted two instances in which a claimed invention is only 
obvious-to-try. First, an invention is merely obvious-to-try (and therefore patentable) if it is 
necessary to try each of numerous possible choices until one possibly arrived at a successful 
result, where the prior art gave no direction as to which of many possible choices is likely to be 
successful. 7 U.S.P.Q.2d at 1681 (citations omitted). Second, an invention is merely obvious-to- 
try (and therefore patentable) where the prior art gives only general guidance as to the particular 
form of the claimed invention or how to achieve it. At least the first of these situations applies 
here. 

Specifically, the secondary references fail to provide any direction as to which of many 
possible choices of regulatory sequences in a parent nucleic acid sequence to alter but for the 
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specific types of sequences disclosed in those references. That is, the references fail to 
explicitly teach or suggest a reduction in the number of a combination of TFBS, and optionally 
intron splice sequences, poly(A) addition sequences, and promoter sequences. Consequently, the 
teachings of the secondary references fail to provide any meaningful guidance with respect to the 
presently claimed invention. 

iv. When Combined, the References do not Teach or Suggest all the Claim 
Limitations 

With respect to claims 1 and 67, the Board is requested to consider that none of the cited 
references teaches or suggests that the codons which differ in a first synthetic nucleic acid 
molecule relative to a second synthetic nucleic acid molecule are mammalian codons selected to 
resuh in the first synthetic nucleic acid molecule having a reduced number of a combination of 
different mammalian TFBS, and optionally a reduced number of intron splice sites, poly(A) 
addition sites or prokaryotic 5' noncoding regulatory sequences. Those sites in the second 
synthetic nucleic acid molecule are the result of replacing mammalian high usage codons for 
codons in a wild type nucleic acid molecule encoding a reporter polypeptide, e.g., a synthetic 
nucleic acid molecule encoding a luciferase. 

None of the cited references teach or suggest a synthetic nucleic acid molecule encoding 
a reporter polypeptide replacing codons in the second synthetic nucleic acid molecule with 
mammalian codons selected to reduce the number of known mammalian TFBS (claims 91-92). 
Claim 92 is directed to a first synthetic nucleic acid molecule comprising at least 300 nucleotides 
of a coding region for a reporter polypeptide which has at least 90% amino acid sequence 
identity to a reporter polypeptide encoded by a wild type nucleic acid sequence, where the first 
synthetic nucleic acid molecule is prepared by replacing codons in the wild type nucleic acid 
molecule with mammalian high usage codons, yielding a second synthetic nucleic acid molecule, 
and replacing codons in the second synthetic nucleic acid molecule with mammalian codons 
selected to reduce the number of a combination of different, known mammalian transcription 
factor binding sites, yielding the first synthetic nucleic acid molecule. 

And with regard to claim 95, none of the cited references teach or suggest identifying in a 
wild type or second synthetic nucleic acid sequence intron splice sites are selected from 
AGGTRAGT, AGGTRAG, GGTRAGT or YNCAGG, poly(A) addition sites having AATAAA, 
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prokaryotic 5' noncoding regulatory sequences with TATAAT, or AGGA or GGAG if a 
methionine codon is within 12 bases 3' of the AGGA or GGAG, and mammalian transcription 
factor binding sequences are in a database of transcription factor binding sequences, mutant 
transcription factor binding sequences and consensus transcription factor binding sequences, that 
are identified under parameters that allow for partial ambiguity with sequences in the database. 

As such, a prima facie case of obviousness has not been made out for claims 1, 3-6, 9, 
1 1-12, 15, 20-21, 24-39, 41-45, 60, 67, 69-70, 81, 86, and 90-94. 



2. The Rejection of Claims 18. 47. 7K 74. 76-78. 80. 82-85. 87-88. and 96 

a. Discussion of the Cited Art 

Sherf et al., Zolotukhin et ah, Donnelly et al., Pan et al., Cornelissen et al. and Hey et al. 
are discussed above. Wood et al. disclose thermostable beetle luciferases and a method to 
prepare those luciferases. It is disclosed that the thermostable beetle luciferases have a plurality 
of amino acid substitutions relative to a wild type beetle luciferase, and that they may be 
prepared by iterative mutagenesis and selection methods. 

b. The Examiner has failed to Make Out a Prima facie Case of Obviousness 

The combination of references does not disclose or suggest Appellant's invention for the 
following reasons. 

i. Cornelissen et al. Teach Away from the Claimed Invention and Therefore 
Cannot be Combined with the Other References' Teachings 

Cornelissen et al. teach away from the claimed invention because Cornelissen et al. teach 
that very few modifications in the coding region are required, and that codon replacement with 
codons used more frequently in a particular host cell is not needed, to substantially alleviate 
expression problems. Cornelissen et al. went so far as to suggest that because only a relatively 
small number of modifications result in a substantial increase of foreign gene expression in 
plants, the modified genes produced in accordance with their invention are unlikely to contain 
newly introduced sequences that interfere themselves with expression of the gene in a plant cell 
environment (column 13, lines 37-42). 

As such, it would be improper to combine Cornelissen et al. with the remaining 
references because Cornelissen et al. teach away from the combination. M.P.E.P. 2145 ("It is 
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improper to combine references where the references teach away from their combination/'); In re 
Grasselli . 713 F.2d 731, 743, 218 U.S.P.Q. 769, 779 (Fed. Cir. 1983). 

ii. Impermissible Hindsight Cannot be Used to Combine the Teachings of Sherf 
et al. with the Other References 

The combination of references does not disclose or suggest Appellant's invention as each 
reference discloses a different way to modify the coding sequence of a different gene. While 
there is a general teaching in the combination of cited documents to alter codons and/or remove 
certain undesired sequences in a selected sequence, none of the cited documents teaches or 
suggests that codon alterations, optionally in conjunction with removal of ATTTA sequences, 
splice sites, endonuclease cleavage sites, prokaryotic promoters, poly(A) signals, prokaryotic 
factor-independent RNA polymerase terminators, inverted repeats, long runs of purines, RNA 
polymerase termination signals, TA and CG doublets and blocks of G or C residues of more than 
about 4 residues, may create TFBS . The Examiner has acknowledged that none of the cited 
documents explicitly teach that codon replacements may create unwanted TFBS (page 13 of the 
Office Action dated September 13, 2006). Moreover, none of the cited documents discloses or 
suggests reiterative removal of TFBS from a codon altered gene of any type. 

And although one of skill in the art in possession of the cited documents may be 
motivated to alter the codons of a particular sequence, there is no direction in the combination of 
cited documents which directs one of skill in the art to Appellant's invention . It is only with 
hindsight, i.e., with knowledge of Appellant's invention , that one of skill in the art, picking and 
choosing from the cited documents, may be directed to Appellant's invention. 

With regard to Appellant's synthetic luciferase encoding polynucleotides (claims 1 8, 47, 
71, 74, 76-78, 80, 82-85, 87-88, and 96), one of skill in the art in possession of the cited art, 
would be required to modify a luciferase gene (Sherf et al. and Wood et al.), rather than a non- 
luciferase gene (Zolotukhin et al., Donnelly et al., Cornelissen et al.. Hey et al., and Pan et al.), 
by codon replacement over at least 25% of an open reading frame (Zolotukhin et al., Donnelly et 
al.. Hey et al. and Pan et al.) rather than by alterations in a portion of an open reading frame 
(Sherf et al. and Cornelissen et al.) or by mutagenesis and selection (Wood et al.), with 
subsequent additional directed alterations to the nucleotide sequence to remove undesired 
sequences introduced by human to optimal human, random to maize, or random human to other 
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human, codon replacements (Donnelly et a!., Hey et ah and Pan et al.) rather than a lack of 
substantive subsequent additional directed alterations to the nucleotide sequence to remove 
undesired sequences introduced by codon replacement or concurrently with other alterations, or 
via selection (Sherf et al., Zolotukhin et al., Cornelissen et al., and Wood et al.). In some 
instances, codons are further replaced to remove ATTTA sequences, splice sites, endonuclease 
cleavage sites, prokaryotic promoters, poly(A) signals, prokaryotic factor-independent RNA 
polymerase terminators, inverted repeats, long runs of purines, RNA polymerase termination 
signals, TA and CG doublets and blocks of G or C residues of more than about 4 residues 
(Donnelly et al.. Pan et al. and Hey et al.). 

Assuming for the sake of argument, that the cited documents may provide the motivation 
to repeat the alterations disclosed therein in a different gene, or make additional alterations in the 
same gene, as there is no teaching or suggestion of Appellant's invention in the cited documents 
taken alone or in combination, the cited documents do not provide the motivation to arrive at 
Appellant's invention. That is because none of the cited documents recognizes that replacement 
of nonmammalian codons in a parent polynucleotide with mammalian codons introduces 
mammalian transcription factor binding sites not found in the parent sequence. Moreover, none 
of the cited documents suggests that a polynucleotide that is modified by replacement of 
nonmammalian codons with mammalian codons be further modified by replacement with other, 
lower usage mammalian codons to reduce the number of introduced mammalian transcription 
factor binding sites. 

iii. The Examiner has used the Incorrect "Obvious-to-Try" Standard 

In making the obviousness rejection, the Examiner clearly relies upon the discredited 
"obvious-to-try" standard. In re OTarrelK 7 U.S.P.Q.2d 1673 (Fed. Cir. 1988), outlines when an 
invention is obvious, and therefore unpatentable, versus when an invention is obvious-to-try, and 
therefore patentable. The Court noted two instances in which a claimed invention is only 
obvious-to-try. First, an invention is merely obvious-to-try (and therefore patentable) if it is 
necessary to try each of numerous possible choices until one possibly arrived at a successful 
result, where the prior art gave no direction as to which of many possible choices is likely to be 
successful. 7 U.S.P.Q.2d at 1681 (citations omitted). Second, an invention is merely obvious-to- 
try (and therefore patentable) where the prior art gives only general guidance as to the particular 
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form of the claimed invention or how to achieve it. At least the first of these situations applies 
here. 

Specifically, the secondary references fail to provide any direction as to which of many 
possible choices of regulatory sequences in a parent nucleic acid sequence to alter but for the 
specific types of sequences disclosed in those references. That is, the references fail to 
explicitly teach or suggest a reduction in the number of a combination of TFBS, and optionally 
intron splice sequences, poly(A) addifion sequences, and promoter sequences. Consequently, the 
teachings of the secondary references fail to provide any meaningful guidance with respect to the 
presently claimed invention. 

iv. When Combined, the References do not Teach or Suggest all the Claim 

Limitations 

With regard to claims 1 8, 47 and 78, none of the cited documents discloses or suggests a 
synthetic nucleic acid molecule comprising SEQ ID N0:7, SEQ ID N0:8, SEQ ID N0:9, or 
SEQ ID NO:297, or a nucleic acid molecule which is capable of hybridizing thereto under high 
stringency conditions, or the complement of the hybridizable nucleic acid molecule which 
encodes a luciferase (claim 18); a first polynucleotide which hybridizes under medium 
stringency hybridization conditions to SEQ ID N0:9, SEQ ID N0:1 8, SEQ ID NO:297, SEQ ID 
NO:301 , or the complement thereof, and comprises an open reading frame encoding a beetle 
luciferase polypeptide which has at least 90% amino acid sequence identity to a luciferase having 
SEQ ID NO:23 encoded by a corresponding wild type nucleic acid sequence having SEQ ID 
N0:1 (claim 47); or a polynucleotide which hybridizes under medium stringency hybridization 
conditions to SEQ ID N0:9 or SEQ ID NO:297, or the complement thereof, and comprises an 
open reading frame encoding a luciferase polypeptide which has at least 90% amino acid 
sequence identity to a luciferase encoded by a parent nucleic acid sequence having SEQ ID N0:2 
(claim 78). In addition, none of the cited documents discloses or suggests a synthetic nucleic acid 
molecule comprising at least 300 nucleotides of a coding region for a luciferase which has at 
least 90% amino acid sequence identity to a luciferase encoded by a parent nucleic acid sequence 
having SEQ ID N0:2, where the codon composition of the synthetic nucleic acid molecule is 
different from that of the parent nucleic acid sequence and is different than the mammalian high 
usage codon composition of a second synthetic nucleic acid molecule which encodes a luci ferase 
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which has at least 90% amino acid sequence identity to the luciferase encoded by the parent 
nucleic acid sequence, where the mammaHan high usage codons are selected to result in the 
second synthetic nucleic acid molecule having a reduced number of a combination of different 
mammalian transcription factor binding sequences, and optionally a reduced number of intron 
splice sites, poly(A) addition sites or prokaryotic 5' noncoding regulatory sequences relative to 
the parent nucleic acid sequence, and the codons which differ in the first synthetic nucleic acid 
molecule relative to the second synthetic nucleic acid molecule are mammalian codons selected 
to result in the first synthetic nucleic acid molecule having a reduced number of the combination 
that are introduced to the second synthetic nucleic acid molecule by selecting the mammalian 
high usage codons, and where the mammalian transcription factor binding sequences are those 
present in a database of transcription factor binding sequences (claim 74), e.g., a synthetic 
luciferase encoding nucleic acid molecule where the mammalian transcription factor binding 
sequences, intron splice sites, poly(A) addition sites and prokaryotic 5' noncoding regulatory 
sequences in the parent nucleic acid sequence or the second synthetic nucleic acid sequence are 
identified with software, where the identified intron splice sites are selected from AGGTRAGT, 
AGGTRAG, GGTRAGT or YNCAGG, the identified poly(A) addition sites have AATAAA, the 
identified prokaryotic 5' noncoding regulatory sequences are selected from TATAAT, or AGGA 
or GGAG if a methionine codon is within 12 bases 3' of the AGGA or GGAG, and the identified 
mammalian transcription factor binding sequences are in a database of transcription factor 
binding sequences, mutant transcription factor binding sequences and consensus transcription 
factor binding sequences, and idenfified under parameters that allow for partial ambiguity with 
sequences in the database, where the codons are selected to reduce the number of identified 
sequences or sites, and where the first synthetic nucleic acid molecule has fewer mammalian 
transcription factor binding sequences than the second synthetic nucleic acid molecule which has 
fewer mammalian transcripfion factor binding sequences than the parent nucleic acid sequence 
(claim 96). 

Finally, none of the cited documents discloses or suggests a first polynucleotide which 
hybridizes under high stringency hybridization conditions to SEQ ID N0:9, SEQ ID NO: 18, 
SEQ ID NO:297, SEQ ID NO:301, or the complement thereof, and comprises an open reading 
frame encoding a luciferase polypeptide which has at least 90% amino acid sequence identity to 
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a beetle luciferase having SEQ ID NO:23 encoded by a corresponding wild type nucleic acid 
sequence, or a first polynucleotide which hybridizes under high stringency hybridization 
conditions to SEQ ID N0:9 or SEQ ID NO:297, or the complement thereof, and comprises an 
open reading frame encoding a luciferase polypeptide which has at least 90% amino acid 
sequence identity to a polypeptide encoded by a parent nucleic acid sequence having SEQ ID 
N0:2 (claims 83 and 84). 

As such, a prima facie case of obviousness has not been made out for claims 1 8, 47, 71 , 
74, 76-78, 80, 82-85, 87-88, and 96. 
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8. SUMMARY 

It is respectfully submitted that the specification and claims of the present application 
satisfy the requirements of 35 U.S.C. § 1 12, first and second paragraphs, and that the claims are 
not obvious over the cited art. Therefore, reversal of the rejections and allowance of the pending 
claims is respectfully requested. 

Respectfully submitted, 
KEITH V. WOOD ET AL. 



By their Representatives, 

SCHWEGMAN, LUNDBERG, WOESSNER & 

KLUTH, P A. 
P.O. Box 2938 
Minneapolis, MN 55^02 
(612) 373-6959 




Date: August 23, 2007 
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CLAIMS APPENDIX 



1 . A first synthetic nucleic acid molecule comprising at least 300 nucleotides of a coding 
region for a reporter polypeptide which has at least 90% amino acid sequence identity to a 
reporter polypeptide encoded by a wild type nucleic acid sequence, wherein the codon 
composition of the first synthetic nucleic acid molecule is different at more than 25% of the 
codons from that of the wild type nucleic acid sequence and is different than the codon 
composition of a second synthetic nucleic acid molecule which encodes a reporter polypeptide 
which has at least 90% amino acid sequence identity to the reporter polypeptide encoded by the 
wild type nucleic acid sequence, wherein the codons in the second synthetic nucleic acid 
molecule that are different than the codons in the wild type nucleic acid sequence are 
mammalian high usage codons selected to result in the second synthetic nucleic acid molecule 
having a reduced number of a combination of different mammalian transcription factor binding 
sequences, and optionally a reduced number of intron splice sites, poly(A) addition sites or 
prokaryotic 5' noncoding regulatory sequences relative to the wild type nucleic acid sequence, 
wherein the codons which differ in the first synthetic nucleic acid molecule relative to the second 
synthetic nucleic acid molecule are mammalian codons selected to result in the first synthetic 
nucleic acid molecule having a reduced number of a combination of different mammalian 
transcription factor binding sequences, and opfionally a reduced number of intron splice sites, 
poly(A) addition sites or prokaryotic 5' noncoding regulatory sequences, that are introduced to 
the second synthetic nucleic acid molecule by selecting the mammalian high usage codons, 
wherein the mammalian transcripfion factor binding sequences are those present in a database of 
transcription factor binding sequences, wherein the wild type nucleic acid sequence encodes 
chloramphenicol acetyltransferase, Renilla luciferase, beetle luciferase, beta-lactamase, beta- 
glucuronidase or beta-galactosidase. 

3. The first synthetic nucleic acid molecule of claim 1 wherein the codon composition of the 
first synthetic nucleic acid molecule differs from the wild type nucleic acid sequence at more 
than 35% of the codons. 
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4. The first synthetic nucleic acid molecule of claim 1 wherein the codon composition of the 
first synthetic nucleic acid molecule differs from the wild type nucleic acid sequence at more 
than 45% of the codons. 

5. The first synthetic nucleic acid molecule of claim 1 wherein the codon composition of the 
first synthetic nucleic acid molecule differs from the wild type nucleic acid sequence at more 
than 55% of the codons. 

6. The first synthefic nucleic acid molecule of claim 1 wherein the majority of codons which 
differ are ones that are preferred codons of a desired host cell, 

9. The first synthetic nucleic acid molecule of claim 1 which encodes a luciferase. 

1 1 . The first synthefic nucleic acid molecule of claim 9 wherein the wild type nucleic acid 
sequence encodes a beetle luciferase. 

12. The first synthefic nucleic acid molecule of claim 1 1 wherein the first synthetic nucleic 
acid molecule encodes the amino acid valine at position 224. 

15. The first synthetic nucleic acid molecule of claim 1 or 9 wherein the majority of codons 
which differ in the second synthetic nucleic acid molecule are those which are preferred codons 
in humans. 

1 8. A synthetic nucleic acid molecule comprising SEQ ID N0:7 (GRver5), SEQ ID N0:8 
(GRver6), SEQ ID N0:9 (GRver5.1), or SEQ ID NO:297 (GRver5.1X or a nucleic acid molecule 
which is capable of hybridizing thereto under high stringency conditions, or the complement of 
the hybridizable nucleic acid molecule which encodes a luciferase. 
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20. The first synthetic nucleic acid molecule of claim 1 5 wherein the majority of codons 
which differ are the human codons CGC, CTG, TCT, AGC, ACC, CCA, CCT, GCC, GGC, 
GTG, ATC, ATT, AAG, AAC, CAG, CAC, GAG, GAC, TAC, TGC and TTC. 

21 . The first synthetic nucleic acid molecule of claim 1 5 wherein the majority of codons 
which differ are the human codons CGC, CTG, TCT, ACC, CCA, GCC, GGC, GTC, and ATC 
or codons CGT, TTG, AGC, ACT, CCT, GCT, GGT, GTG and ATT. 

24. The first synthetic nucleic acid molecule of claim 1 wherein the first synthetic nucleic 
acid molecule is expressed in a mammalian host cell at a level which is greater than that of the 
wild type nucleic acid sequence. 

25. The first synthetic nucleic acid molecule of claim 1 wherein the first synthetic nucleic 
acid molecule has an increased number of CTG or TTG leucine-encoding codons. 

26. The first synthetic nucleic acid molecule of claim 1 wherein the first synthetic nucleic 
acid molecule has an increased number of GTG or GTC valine-encoding codons. 

27. The first synthetic nucleic acid molecule of claim 1 wherein the first synthetic nucleic 
acid molecule has an increased number of GGC or GGT glycine-encoding codons. 

28. The first synthetic nucleic acid molecule of claim 1 wherein the first synthetic nucleic 
acid molecule an increased number of ATC or ATT isoleucine-encoding codons. 

29. The first synthetic nucleic acid molecule of claim 1 wherein the first synthetic nucleic 
acid molecule has an increased number of CCA or CCT proline-encoding codons. 

30. The first synthetic nucleic acid molecule of claim 1 wherein the first synthetic nucleic 
acid molecule has an increased number of CGC or CGT arginine-encoding codons. 
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3L The first synthetic nucleic acid molecule of claim 1 wherein the first synthetic nucleic 
acid molecule has an increased number of AGC or TCT serine-encoding codons. 



32. The first synthetic nucleic acid molecule of claim 1 wherein the first synthetic nucleic 
acid molecule has an increased number of ACC or ACT threonine-encoding codons. 

33. The first synthetic nucleic acid molecule of claim 1 wherein the first synthetic nucleic 
acid molecule has an increased number of GCC or GCT alanine-encoding codons. 

34. The first synthetic nucleic acid molecule of claim 1 wherein the codons in the first 
synthetic nucleic acid molecule which differ encode the same amino acids as the corresponding 
codons in the wild type nucleic acid sequence. 

35. A plasmid comprising the first synthetic nucleic acid molecule of claim 1 . 

36. An expression vector comprising the first synthetic nucleic acid molecule of claim 1 
linked to a promoter functional in a cell. 

37. The expression vector of claim 36 wherein the first synthetic nucleic acid molecule is 
operatively linked to a Kozak consensus sequence. 

38. The expression vector of claim 36 wherein the promoter is functional in a mammalian 
cell. 

39. The expression vector of claim 36 wherein the promoter is functional in a human cell. 

41 . The expression vector of claim 36 wherein the expression vector further comprises a 
multiple cloning site. 
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42. The expression vector of claim 41 wherein the expression vector comprises a multiple 
cloning site positioned between the promoter and the first synthetic nucleic acid molecule. 

43. The expression vector of claim 41 wherein the expression vector comprises a multiple 
cloning site positioned downstream from the first synthetic nucleic acid molecule. 

44. An isolated host cell comprising the expression vector of claim 36. 

45. A kit comprising, in suitable container means, the expression vector of claim 36, wherein 
the first synthetic nucleic acid molecule encodes a reporter molecule. 

47. A first polynucleotide which hybridizes under medium stringency hybridization 
conditions to SEQ ID N0:9 (GRverS.l), SEQ ID N0:1 8 (RD156-1 H9X SEQ ID NO:297 
(GRverS.l), SEQ ID NO:301 (RD156-1H9), or the complement thereof, and comprises an open 
reading frame encoding a beetle luciferase polypeptide which has at least 90% amino acid 
sequence identity to a luciferase having SEQ ID NO:23 encoded by a corresponding wild type 
nucleic acid sequence having SEQ ID N0:1, wherein the codon composition of the open reading 
frame of the first polynucleotide is different at more than 25% of the codons from that of the 
wild type luciferase nucleic acid sequence and is different than the codon composition of a 
second polynucleotide which encodes a polypeptide which has at least 90% amino acid sequence 
identity to the polypeptide encoded by the wild type nucleic acid sequence, wherein the codons 
in the second polynucleotide that are different than the codons in the wild type nucleic acid 
sequence are mammalian high usage codons selected to result in the second polynucleotide 
having a reduced number of a combination of different mammalian transcription factor binding 
sequences, intron splice sites, poly(A) addition sites or prokaryotic 5' noncoding regulatory 
sequences relative to the wild type nucleic acid sequence, wherein the codons which differ in the 
first polynucleotide relative to the second polynucleotide are mammalian codons selected to 
result in the open reading frame in the first polynucleotide having a reduced number of a 
combination of different mammalian transcription factor binding sequences, and optionally a 
reduced number of intron splice sites, poly(A) addition sites or prokaryotic 5' noncoding 
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regulatory sequences, that are introduced to the second polynucleotide by selecting the 
mammalian high usage codons, wherein the mammalian transcription factor binding sequences 
are those present in a database of transcription factor binding sequences. 

60, The first synthetic nucleic acid molecule of claim 1 wherein the first synthetic nucleic 
acid molecule is expressed at a level which is at least 1 10% of that of the wild type nucleic acid 
sequence in a cell or cell extract under identical conditions. 

67. A first synthetic nucleic acid molecule comprising at least 300 nucleotides of a coding 
region for a luciferase which has at least 90% amino acid sequence identity to a reporter 
polypeptide encoded by a wild type beetle luciferase nucleic acid sequence, wherein the codon 
composition of the first synthetic nucleic acid molecule is different at more than 25% of the 
codons from that of the wild type nucleic acid sequence and is different than the codon 
composition of a second synthetic nucleic acid molecule which encodes a luciferase which has at 
least 90% amino acid sequence identity to the luciferase encoded by the wild type nucleic acid 
sequence, wherein the codons in the second synthetic nucleic acid molecule that are different 
than the codons in the wild type nucleic acid sequence are mammalian high usage codons 
selected to result in the second synthetic nucleic acid molecule having a reduced number of a 
combination of different mammalian transcription factor binding sequences, and optionally a 
reduced number of intron splice sites, poly(A) addition sites or prokaryotic 5' noncoding 
regulatory sequences relative to the wild type nucleic acid sequence, wherein the codons which 
differ in the first synthetic nucleic acid molecule relative to the second synthetic nucleic acid 
molecule are mammalian codons selected so as to result in the first synthetic nucleic acid 
molecule having a reduced number of a combination of different mammalian transcription factor 
binding sequences, and optionally a reduced number of intron splice sites, poly(A) addition sites 
or prokaryotic 5' noncoding regulatory sequences, that are introduced to the second synthetic 
nucleic acid molecule by selecting the mammalian high usage codons, wherein the mammalian 
transcription factor binding sequences are those present in a database of transcription factor 
binding sequences. 
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69. The first synthetic nucleic acid molecule of claim 1 1 or 67 which has 74% or less nucleic 
acid sequence identity to the wild type nucleic acid sequence. 

70. The first synthetic nucleic acid molecule of claim 11 or 67 which has at least 40-fold 
increased expression relative to the wild type nucleic acid sequence. 

71 . The first polynucleotide of claim 47 which hybridizes under high stringency 
hybridization conditions to SEQ ID N0:9 (GRverS.l), SEQ ID N0:18 (RD156-1 H9), SEQ ID 
NO:297(GRver5.1), SEQ ID NO:301 (RD156-1H9X or the complement thereof 

74. A first synthetic nucleic acid molecule comprising at least 300 nucleotides of a coding 
region for a luciferase which has at least 90% amino acid sequence identity to a luciferase 
encoded by a parent nucleic acid sequence having SEQ ID N0:2, wherein the codon composition 
of the synthetic nucleic acid molecule is different at more than 25% of the codons from that of 
the parent nucleic acid sequence and is different than the codon composition of a second 
synthetic nucleic acid molecule which encodes a luciferase which has at least 90% amino acid 
sequence identity to the luciferase encoded by the parent nucleic acid sequence, wherein the 
codons in the second synthetic nucleic acid molecule that are different than the codons in the 
parent nucleic acid sequence are mammalian high usage codons selected to result in the second 
synthetic nucleic acid molecule having a reduced number of a combination of different 
mammalian transcription factor binding sequences, and optionally a reduced number of intron 
splice sites, poly(A) addition sites or prokaryotic 5' noncoding regulatory sequences relative to 
the parent nucleic acid sequence, wherein the codons which differ in the first synthetic nucleic 
acid molecule relative to the second synthetic nucleic acid molecule are mammalian codons 
selected to result in the first synthetic nucleic acid molecule having a reduced number of a 
combination of different mammalian transcription factor binding sequences, and optionally a 
reduced number of intron splice sites, poly(A) addition sites or prokaryotic 5' noncoding 
regulatory sequences, that are introduced to the second synthetic nucleic acid molecule by 
selecting the mammalian high usage codons, wherein the mammalian transcription factor binding 
sequences are those present in a database of transcription factor binding sequences. 
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76. The first synthetic nucleic acid molecule of claim 74 wherein the polypeptide encoded by 
the first synthetic nucleic acid molecule has at least 95% amino acid identity to the luciferase 
encoded by the parent nucleic acid sequence. 

77. The first synthefic nucleic acid molecule of claim 74 which has 74% or less nucleic acid 
sequence identity to the parent nucleic acid sequence. 

78. A first polynucleotide which hybridizes under medium stringency hybridization 
conditions to SEQ ID N0:9 (GRverS.l) or SEQ ID NO:297 (GRverS.l ), or the complement 
thereof, and comprises an open reading frame encoding a luciferase polypeptide which has at 
least 90% amino acid sequence idenfity to a luciferase encoded by a parent nucleic acid sequence 
having SEQ ID N0:2, wherein the codon composition of the open reading frame of the first 
polynucleotide is different at more than 25% of the codons from that of the parent nucleic acid 
sequence and is different than the codon composition of a second polynucleotide which encodes 
a polypeptide which has at least 90% amino acid sequence identity to the luciferase encoded by 
the parent nucleic acid sequence, wherein the codons in the second polynucleotide that are 
different than the codons in the parent nucleic acid sequence are mammalian high usage codons 
selected to result in the second polynucleotide having a reduced number of a combination of 
different mammalian transcription factor binding sequences, and optionally a reduced number of 
intron splice sites, poly(A) addifion sites or prokaryotic 5' noncoding regulatory sequences 
relative to the parent nucleic acid sequence, wherein the codons which differ in the first 
polynucleotide relative to the second polynucleotide are mammalian codons selected to result in 
the first polynucleotide having a reduced number of a combination of different mammalian 
transcription factor binding sequences, and optionally a reduced number of intron splice sites, 
poly(A) addition sites or prokaryotic 5' noncoding regulatory sequences that are introduced to the 
second polynucleotide by selecting the mammalian high usage codons, wherein the mammalian 
transcription factor binding sequences are those present in a database of transcription factor 
binding sequences. 



r 
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80. The first polynucleotide of claim 78 wherein the polypeptide encoded by the first 
polynucleotide has at least 95% amino acid identity to the luciferase encoded by the parent 
nucleic acid molecule. 

81. The first synthetic nucleic acid molecule of claim 1, 67 or 74 wherein the transcription 
factor binding sequence is at least 5 bases in length. 

82. The first polynucleotide of claim 47 or 78 wherein the transcription factor binding 
sequence is at least 5 bases in length. 

83. A first polynucleotide which hybridizes under high stringency hybridization conditions to 
SEQ ID N0:9 (GRverS.l), SEQ ID N0:18 (RD156-1H9X SEQ ID NO:297 (GRverS.l), SEQ ID 
NO:301 (RD156-1H9)5 or the complement thereof, and comprises an open reading frame 
encoding a luciferase polypeptide which has at least 90% amino acid sequence identity to a 
beetle luciferase having SEQ ID NO:23 encoded by a corresponding wild type nucleic acid 
sequence, wherein the codon composition of the open reading frame of the first polynucleotide is 
different at more than 25% of the codons from that of the wild type nucleic acid sequence. 

84. A first polynucleotide which hybridizes under high stringency hybridization conditions to 
SEQ ID N0:9 (GRverS.l) or SEQ ID NO:297 (GRverS.l), or the complement thereof, and 
comprises an open reading frame encoding a luciferase polypeptide which has at least 90% 
amino acid sequence identity to a polypeptide encoded by a parent nucleic acid sequence having 
SEQ ID N0:2, wherein the codon composifion of the open reading frame of the first 
polynucleotide is different at more than 25% of the codons from that of the parent nucleic acid 
sequence. 

85. The first polynucleotide of claim 78 which hybridizes under high stringency conditions. 



86. The first synthetic sequence of claim 1 wherein the selection of mammalian high usage 
codons and mammalian codons also reduces the number of restriction endonuclease sites. 
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87. The first polynucleotide of claim 47 or 78 wherein the selection of mammalian high 
usage codons and mammalian codons also reduces the number of restriction endonuclease sites. 

88. The first synthetic nucleic acid molecule of claim 67 or 74 wherein the selection of 
mammalian high usage codons and mammalian codons also reduces the number of restriction 
endonuclease sites. 



90. The first synthetic nucleic acid molecule of claim 1 wherein the mammalian transcription 
factor binding sequences, intron splice sites, poly(A) addition sites and prokaryotic 5' noncoding 
regulatory sequences in the wild type nucleic acid sequence or the second synthetic nucleic acid 
sequence are identified with software, wherein the identified intron splice sites are selected from 
AGGTRAGT, AGGTRAG, GGTRAGT or YNCAGG, the identified poly(A) addition sites have 
AATAAA, the identified prokaryotic 5* noncoding regulatory sequences are selected from 
TATAAT, or AGGA or GGAG if a methionine codon is within 12 bases 3' of the AGGA or 
GGAG, and the identified mammalian transcription factor binding sequences are in a database of 
transcription factor binding sequences, mutant transcription factor binding sequences and 
consensus transcription factor binding sequences, and identified under parameters that allow for 
partial ambiguity with sequences in the database, wherein the codons are selected to reduce the 
number of identified sequences or sites, and wherein the first synthetic nucleic acid molecule has 
fewer mammalian transcription factor binding sequences than the second synthetic nucleic acid 
molecule which has fewer mammalian transcription factor binding sequences than the wild type 
nucleic acid sequence. 

91 . A first synthetic nucleic acid molecule comprising at least 300 nucleotides of a coding 
region for a reporter polypeptide which has at least 90% amino acid sequence identity to a 
reporter polypeptide encoded by a wild type nucleic acid sequence, wherein the codon 
composition of the first synthetic nucleic acid molecule is different at more than 25% of the 
codons from that of the wild type nucleic acid sequence, wherein the codons in the first synthetic 
nucleic acid molecule that are different than the codons in the wild type nucleic acid sequence 
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are mammalian high usage codons selected to result in the first synthetic nucleic acid molecule 
having a reduced number of known mammalian transcription factor binding sequences. 

92. A first synthetic nucleic acid molecule comprising at least 300 nucleotides of a coding 
region for a reporter polypeptide which has at least 90% amino acid sequence identity to a 
reporter polypeptide encoded by a wild type nucleic acid sequence, wherein the first synthetic 
nucleic acid molecule is prepared by replacing codons in the wild type nucleic acid molecule 
with mammalian high usage codons, yielding a second synthetic nucleic acid molecule, and 
replacing codons in the second synthetic nucleic acid molecule with mammalian codons selected 
to reduce the number of a combination of different, known mammalian transcription factor 
binding sites, yielding the first synthetic nucleic acid molecule, wherein the codon composition 
of the first synthetic nucleic acid molecule is different at more than 25% of the codons from that 
of the wild type nucleic acid sequence, wherein the wild type nucleic acid sequence encodes 
chloramphenicol acetyltransferase, Renilla luciferase, beetle luciferase, beta-lactamase, beta- 
glucuronidase or beta-galactosidase. 

93. The first synthetic nucleic acid molecule of claim 91 or 92 which has at least 2-fold fewer 
mammalian transcription factor binding sequences relative to the wild type nucleic acid 
sequence. 

94. The first synthetic nucleic acid molecule of claim 91 or 92 wherein codon selection also 
reduces the number of intron splice sites, poly(A) addition sites or promoter sequences, or a 
combination thereof 

95. The first synthetic nucleic acid molecule of claim 67 wherein the mammalian 
transcription factor binding sequences, intron splice sites, poly(A) addition sites and prokaryotic 
5' noncoding regulatory sequences in the wild type nucleic acid sequence or the second synthetic 
nucleic acid sequence are identified with software, wherein the identified intron splice sites are 
selected from AGGTRAGT, AGGTRAG, GGTRAGT or YNCAGG, the identified poly(A) 
addition sites have AATAAA, the identified prokaryotic 5' noncoding regulatory sequences are 
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selected from TATAAT, or AGGA or GGAG if a methionine codon is within 12 bases 3' of the 
AGGA or GGAG, and the identified mammalian transcription factor binding sequences are in a 
database of transcription factor binding sequences, mutant transcription factor binding sequences 
and consensus transcription factor binding sequences, and identified under parameters that allow 
for partial ambiguity with sequences in the database, wherein the codons are selected to reduce 
the number of identified sequences or sites, and wherein the first synthetic nucleic acid molecule 
has fewer mammalian transcription factor binding sequences than the second synthetic nucleic 
acid molecule which has fewer mammalian transcription factor binding sequences than the wild 
type nucleic acid sequence. 

96. The first synthetic nucleic acid molecule of claim 74 wherein the mammalian 
transcription factor binding sequences, intron splice sites, poly(A) addition sites and prokaryotic 
5' noncoding regulatory sequences in the parent nucleic acid sequence or the second synthetic 
nucleic acid sequence are identified with software, wherein the identified intron splice sites are 
selected from AGGTRAGT, AGGTRAG, GGTRAGT or YNCAGG, the identified poly(A) 
addition sites have AATAAA, the identified prokaryotic 5' noncoding regulatory sequences are 
selected from TATAAT, or AGGA or GGAG if a methionine codon is within 12 bases 3' of the 
AGGA or GGAG, and the identified mammalian transcription factor binding sequences are in a 
database of transcription factor binding sequences, mutant transcription factor binding sequences 
and consensus transcription factor binding sequences, and identified under parameters that allow 
for partial ambiguity with sequences in the database, wherein the codons are selected to reduce 
the number of identified sequences or sites, and wherein the first synthetic nucleic acid molecule 
has fewer mammalian transcription factor binding sequences than the second synthetic nucleic 
acid molecule which has fewer mammalian transcription factor binding sequences than the 
parent nucleic acid sequence. 
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TITLE OF THE INVENTION 
SYNTHETIC HEPATITIS C GENES 

CROSS-REFERENCE TO RELATED APPLICATIONS 
5 Not applicable. 

STATEMENT REGARDING FEDERALLY-SPONSORED R&D 
Not applicable. 

1 0 REFERENCE TO MICROFICHE APPENDIX 
Not applicable. 

FIELD OF THE INVENTION 
Not applicable. 

15 

BACKGROUND OF THE INVENTION 

This invention relates to novel nucleic acid pharmaceutical 
products, specifically nucleic acid vaccine products. The nucleic acid 
vaccine products, when introduced directly into muscle cells, induce the 
20 production of immune responses which specifically recognize Hepatitis 
C virus (HCV). 

Hepatitis C Vims 

Non-A, Non-B hepatitis (NANBH) is a transmissible disease 

25 (or family of diseases) that is believed to be virally induced, and is 

distinguishable from other forms of vims-associated liver disease, such 
as those caused by hepatitis A vims (HAV), hepatitis B virus (HBV), 
delta hepatitis virus (HDV), cytomegalovims (CMV) or Epstein-Barr 
vims (EBV). Epidemiologic evidence suggests that there may be three 

30 types of NANBH: the water- borne epidemic type; the blood or needle 
associated type; and the sporadically occurring (community acquired) 
type. However, the number of causative agents is unknown. Recently, a 
new viral species, hepatitis C virus (HCV) has been identified as the 
primary (if not only) cause of blood-associated NANBH (BB-NANBH), 
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Hepatitis C appears to be the major form of transfusion-associated 
hepatitis in a number of countries, including the United States and 
Japan. There is also evidence impHcating HCV in induction of 
hepatocellular carcinoma. Thus, a need exists for an effective method 
5 for preventing or treating HCV infection: currently, there is none. 

The HCV may be distantly related to the flaviviridae. The 
Flavivirus family contains a large number of viruses which are small, 
enveloped pathogens of man. The morphology and composition of 
Flavivirus particles are known, and are discussed in M. A. Brinton, in 

10 "The Viruses: The Togavirldae And Flaviviridae" (Series eds. Fraenkel- 
Conrat and Wagner, vol. eds. Schlesinger and Schlesinger, Plenum 
Press, 1986), pp. 327-374. Generally, with respect to morphology, 
Flaviviruses contain a central nucleocapsid surrounded by a lipid 
bi layer. Virions are spherical and have a diameter of about 40-50 nm. 

15 Their cores are about 25-30 nm in diameter. Along the outer surface of 
the virion envelope are projections measuring about 5-10 nm in length 
with terminal knobs about 2 nm in diameter. Typical examples of the 
family include Yellow Fever virus, West Nile virus, and Dengue Fever 
vims. They possess positive-stranded RNA genomes (about 1 1 ,000 

20 nucleotides) that are shghtly larger than that of HCV and encode a 
polyprotein precursor of about 3500 amino acids. Individual viral 
proteins are cleaved from this precursor polypeptide. 

The genome of HCV appears to be single-stranded RNA 
containing about 10,000 nucleotides. The genome is positive-stranded, 

25 and possesses a continuous translational open reading frame (ORF) that 
encodes a polyprotein of about 3,000 amino acids. In the ORF, the 
structural proteins appear to be encoded in approximately the first 
quarter of the N-terminal region, with the majority of the polyprotein 
attributed to non-structural proteins. When compared with all known 

30 viral sequences, small but significant co-linear homologies are observed 
with the nonstructural proteins of the Flavivirus family, and with the 
pestiviruses (which are now also considered to be part of the Flavivirus 
family). 
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Intramuscular inoculation of polynucleotide constructs, i.e., 
DNA plasmids encoding proteins have been shown to result in the in situ 
generation of the protein in muscle cells. By using cDNA plasmids 
encoding viral proteins, both antibody and CTL responses were 
5 generated, providing homologous and heterologous protection against 
subsequent challenge with either the homologous or cross-strain 
protection, respectively. Each of these types of immune responses 
offers a potential advantage over existing vaccination strategies. The 
use of PNVs (polynucleotide vaccines) to generate antibodies may result 

10 in an increased duration of the antibody responses as well as the 
provision of an antigen that can have both the exact sequence of the 
clinically circulating strain of vims as well as the proper post- 
translational modifications and conformation of the native protein (vs. a 
recombinant protein). The generation of CTL responses by this means 

15 offers the benefits of cross-strain protection without the use of a live 
potentially pathogenic vector or attenuated virus. 

Therefore, this invention contemplates methods for 
introducing nucleic acids into living tissue to induce expression of 
proteins. The invention provides a method for introducing viral 

20 proteins into the antigen processing pathway to generate virus-specific 
immune responses including, but not limited to, CTLs. Thus, the need 
for specific therapeutic agents capable of eliciting desired prophylactic 
immune responses against viral pathogens is met for HCV virus by this 
invention. Of particular importance in this therapeutic approach is the 

25 ability to induce T-cell immune responses which can prevent infections 
even of vims strains which are heterologous to the strain from which 
the antigen gene was obtained. Therefore, this invention provides DNA 
constructs encoding viral proteins of the hepatitis C vims core, envelope 
(El), nonstructural (NS5) genes or any other HCV genes which encode 

30 products which generate specific immune responses including but not 
limited to CTLs. 
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DNA Vaccines 

Benvenisty, N., and Reshef, U fPNAS 83, 9551-9555, 
(1986)] showed that CaCl2-precipitated DNA introduced into mice 

intraperitoneally (i.p.), intravenously (i.v.) or intramuscularly (i.m.) 
5 could be expressed. The i,m. injection of DNA expression vectors 
without CaCl2 treatment in mice resulted in the uptake of DNA by the 
muscle cells and expression of the protein encoded by the DNA , The 
plasmids were maintained episomally and did not replicate. 
Subsequently, persistent expression has been observed after i.m. 

10 injection in skeletal muscle of rats, fish and primates, and cardiac 
muscle of rats. The technique of using nucleic acids as therapeutic 
agents was reported in WO90/1 1092 (4 October 1990), in which 
polynucleotides were used to vaccinate vertebrates. 

It is not necessary for the success of the method that 

15 immunization be intramuscular. The introduction of gold 

microprojectiles coated with DNA encoding bovine growth hormone 
(BGH) into the skin of mice resulted in production of anti-BGH 
antibodies in the mice. A jet injector has been used to transfect skin, 
muscle, fat, and mammary tissues of living animals. Various methods 

20 for introducing nucleic acids have been reviewed, hitravenous injection 
of a DNA:cationic liposome complex in mice was shown by Zhu et al., 
[Science 261:209-21 1 (9 July 1993) to result in systemic expression of a 
cloned transgene. Ulmer et al., [Science 259: 1 745- 1 749, ( 1 993)] 
reported on the heterologous protection against influenza vims infection 

25 by intramuscular injection of DNA encoding influenza virus proteins. 

The need for specific therapeutic and prophylactic agents 
capable of eliciting desired immune responses against pathogens and 
tumor antigens is met by the instant invention. Of particular 
importance in this therapeutic approach is the ability to induce T-cell 

30 immune responses which can prevent infections or disease caused even 
by virus strains which are heterologous to the strain from which the 
antigen gene was obtained. This is of particular concern when dealing 
with HIV as this vims has been recognized to mutate rapidly and many 
vimlent isolates have been identified [see, for example, LaRosa et al., 
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Science 249:932-935 (1990), identifying 245 separate HIV isolates]. In 
response to this recognized diversity, researchers have attempted to 
generate CTLs based on peptide immunization. Thus, Takahashi et al., 
[Science 255:333-336 (1992)] reported on the induction of broadly 
5 cross-reactive cytotoxic T cells recognizing an HIV envelope (gpl60) 
determinant. However, those workers recognized the difficulty in 
achieving a truly cross-reactive CTL response and suggested that there 
is a dichotomy between the priming or restimulation of T cells, which is 
very stringent, and the elicitation of effector function, including 

10 cytotoxicity, from already stimulated CTLs. 

Wang et al. reported on elicitation of immune responses in 
mice against HIV by intramuscular inoculation with a cloned, genomic 
(unspliced) HIV gene. However, the level of immune responses 
achieved in these studies was very low. In addition, the Wang et al., 

15 DNA construct utilized an essentially genomic piece of HTV encoding 
contiguous Tat//?£:K-gpl60-Tat//?£:v coding sequences. As is described 
in detail below, this is a suboptimal system for obtaining high-level 
expression of the gpl60. It also is potentially dangerous because 
expression of Tat contributes to the progression of Karposi's Sarcoma. 

20 WO 93/17706 describes a method for vaccinating an animal 

against a virus, wherein carrier particles were coated with a gene 
construct and the coated particles are accelerated into cells of an animal. 

The instant invention contemplates any of the known 
methods for introducing polynucleotides into living tissue to induce 

25 expression of proteins. However, this invention provides a novel 
immunogen for introducing proteins into the antigen processing 
pathway to efficiently generate specific CTLs and antibodies. 

Codon Usaee and Codon Context 
30 The codon pairings of organisms are highly nonrandom, 

and differ from organism to organism. This information is used to 
construct and express altered or synthetic genes having desired levels of 
translational efficiency, to determine which regions in a genome are 
protein coding regions, to introduce translational pause sites into 
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heterologous genes, and to ascertain relationship or ancestral origin of 
nucleotide sequences 

The expression of foreign heterologous genes in 
transformed organisms is now commonplace. A large number of 
5 mammahan genes, including, for example, murine and human genes, 
have been successfully inserted into single celled organisms. Standard 
techniques in this regard include introduction of the foreign gene to be 
expressed into a vector such as a plasmid or a phage and utilizing that 
vector to insert the gene into an organism. The native promoters for 

10 such genes are commonly replaced with strong promoters compatible 
with the host into which the gene is inserted. Protein sequencing 
machinery permits elucidation of the amino acid sequences of even 
minute quantities of native protein. From these amino acid sequences, 
DNA sequences coding for those proteins can be inferred. DNA 

15 synthesis is also a rapidly developing art, and synthetic genes 
corresponding to those inferred DNA sequences can be readily 
constructed. 

Despite the burgeoning knowledge of expression systems 
and recombinant DNA, significant obstacles remain when one attempts 

20 to express a foreign or synthetic gene in an organism. Many native, 
active proteins, for example, are glycosylated in a manner different 
from that which occurs when they are expressed in a foreign host. For 
this reason, eukaryotic hosts such as yeast may be preferred to bacterial 
hosts for expressing many mammalian genes. The glycosylation 

25 problem is the subject of continuing research. 

Another problem is more poorly understood. Often 
translation of a synthetic gene, even when coupled with a strong 
promoter, proceeds much less efficiently than would be expected. The 
same is frequently true of exogenous genes foreign to the expression 

30 organism. Even when the gene is transcribed in a sufficiently efficient 
manner that recoverable quantities of the translation product are 
produced, the protein is often inactive or otherwise different in 
properties from the native protein. 
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It is recognized that the latter problem is commonly due to 
differences in protein folding in various organisms. The solution to this 
problem has been elusive, and the mechanisms controlhng protein 
folding are poorly understood, 

5 The problems related to translational efficiency are 

believed to be related to codon context effects. The protein coding 
regions of genes in all organisms are subject to a wide variety of 
functional constraints, some of which depend on the requirement for 
encoding a properly functioning protein, as well as appropriate 

10 translational start and stop signals. However, several features of protein 
coding regions have been discerned which are not readily understood in 
terms of these constraints. Two important classes of such features are 
those involving codon usage and codon context. 

It is known that codon utilization is highly biased and varies 

15 considerably between different organisms. Codon usage patterns have 
been shown to be related to the relative abundance of tRN A 
isoacceptors. Genes encoding proteins of high versus low abundance 
show differences in their codon preferences. The possibihty that biases 
in codon usage alter peptide elongation rates has been widely discussed. 

20 While differences in codon use are associated with differences in 

translation rates, direct effects of codon choice on translation have been 
difficult to demonstrate. Other proposed constraints on codon usage 
patterns include maximizing the fidelity of translation and optimizing 
the kinetic efficiency of protein synthesis. 

25 Apart from the non-random use of codons, considerable 

evidence has accumulated that codon/anticodon recognition is influenced 
by sequences outside the codon itself, a phenomenon termed "codon 
context." There exists a strong influence of nearby nucleotides on the 
efficiency of suppression of nonsense codons as well as missense codons. 

30 Clearly, the abundance of suppressor activity in natural bacterial 
populations, as well as the use of "termination" codons to encode 
selenocysteine and phosphoserine require that termination be context- 
dependent. Similar context effects have been shown to influence the 
fidelity of translation, as well as the efficiency of translation initiation. 
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Statistical analyses of protein coding regions of E. coli have 
demonstrate another manifestation of "codon context." The presence of 
a particular codon at one position strongly influences the frequency of 
occurrence of certain nucleotides in neighboring codons, and these 
5 context constraints differ markedly for genes expressed at high versus 
low levels. Although the context effect has been recognized, the 
predictive value of the statistical rules relating to preferred nucleotides 
adjacent to codons is relatively low. This has limited the utility of such 
nucleotide preference data for selecting codons to effect desired levels 

10 of translational efficiency. 

The advent of automated nucleotide sequencing equipment 
has made available large quantities of sequence data for a wide variety 
of organisms. Understanding those data presents substantial difficulties. 
For example, it is important to identify the coding regions of the 

15 genome in order to relate the genetic sequence data to protein 

sequences. In addition, the ancestry of the genome of certain organisms 
is of substantial interest. It is known that genomes of some organisms 
are of mixed ancestry. Some sequences that are viral in origin are now 
stably incorporated into the genome of eukaryotic organisms. The viral 

20 sequences themselves may have originated in another substantially 
unrelated species. An understanding of the ancestry of a gene can be 
important in drawing proper analogies between related genes and their 
translation products in other organisms. 

There is a need for a better understanding of codon context 

25 effects on translation, and for a method for determining the appropriate 
codons for any desired translational effect. There is also a need for a 
method for identifying coding regions of the genome from nucleotide 
sequence data. There is also a need for a method for controlling protein 
folding and for insuring that a foreign gene will fold appropriately 

30 when expressed in a host. Genes altered or constmcted in accordance 
with desired translational efficiencies would be of significant worth. 

Another aspect of the practice of recombinant DNA 
techniques for the expression by microorganisms of proteins of 
industrial and pharmaceutical interest is the phenomenon of "codon 
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preference**. While it was earlier noted that the existing machinery for 
gene expression is genetically transformed host cells will "operate" to 
construct a given desired product, levels of expression attained in a 
microorganism can be subject to wide variation, depending in part on 
5 specific altemative forms of the amino acid-specifying genetic code 
present in an inserted exogenous gene. A "triplet" codon of four 
possible nucleotide bases can exist in 64 variant forms. That these 
forms provide the message for only 20 different amino acids (as well as 
transcription initiation and termination) means that some amino acids 

10 can be coded for by more than one codon. Indeed, some amino acids 
have as many as six "redundant", altemative codons while some others 
have a single, required codon. For reasons not completely understood, 
altemative codons are not at all uniformly present in the endogenous 
DNA of differing types of cells and there appears to exist a variable 

15 natural hierarchy or "preference" for certain codons in certain types of 
cells. 

As one example, the amino acid leucine is specified by any 
of six DNA codons including CTA, CTC, CTG, CTT, TTA, and TTG 
(which correspond, respectively, to the mRNA codons, CUA, CUC, 

20 CUG, CUU, UUA and UUG). Exhaustive analysis of genome codon 
frequencies for microorganisms has revealed endogenous DNA of R, 
coli most commonly contains the CTG leucine-specifying codon, while 
the DNA of yeasts and slime molds most commonly includes a TTA 
leucine-specifying codon. In view of this hierarchy, it is generally held 

25 that the likelihood of obtaining high levels of expression of a leucine- 
rich polypeptide by an E. coli host will depend to some extent on the 
frequency of codon use. For example, a gene rich in TTA codons will 
in all probability be poorly expressed in E. coli , whereas a CTG rich 
gene will probably highly express the polypeptide. Similarly, when 

30 yeast cells are the projected transformation host cells for expression of a 
leucine-rich polypeptide, a preferred codon for use in an inserted DNA 
would be TTA. 

The implications of codon preference phenomena on 
recombinant DNA techniques are manifest, and the phenomenon may 
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serve to explain many prior failures to achieve high expression levels of 
exogenous genes in successfully transformed host organisms-a less 
"preferred'* codon may be repeatedly present in the inserted gene and 
the host cell machinery for expression may not operate as efficiently. 
5 This phenomenon suggests that synthetic genes which have been 

designed to include a projected host ceirs preferred codons provide a 
preferred form of foreign genetic material for practice of recombinant 
DNA techniques. 

10 Protein Trafficking 

The diversity of function that typifies eukaryotic cells 
depends upon the structural differentiation of their membrane 
boundaries. To generate and maintain these structures, proteins must be 
transported from their site of synthesis in the endoplasmic reticulum to 

15 predetermined destinations throughout the cell. This requires that the 
trafficking proteins display sorting signals that are recognized by the 
molecular machinery responsible for route selection located at the 
access points to the main trafficking pathways. Sorting decisions for 
most proteins need to be made only once as they traverse their 

20 biosynthetic pathways since their final destination, the cellular location 
at which they perform their function, becomes their permanent 
residence. 

Maintenance of intracellular integrity depends in part on 
the selective sorting and accurate transport of proteins to their correct 
25 destinations. Over the past few years the dissection of the molecular 
machinery for targeting and localization of proteins has been studied 
vigorously. Defined sequence motifs have been identified on proteins 
which can act as 'address labels'. A number of sorting signals have been 
found associated with the cytoplasmic domains of membrane proteins. 

30 

SUMMARY OF THE INVENTION 

This invention relates to novel formulations of nucleic acid 
pharmaceutical products, specifically nucleic acid vaccine products. 
The nucleic acid products, when introduced directly into muscle cells, 
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induce the production of immune responses which specifically recognize 
Hepatitis C virus (HCV). 

BRIEF DESCRIPTION OF THE DRAWINGS 
5 Figure 1 shows the nucleotide sequence of the V 1 Ra vector. 

Figure 2 is a diagram of the V 1 Ra vector. 
Figure 3 is a diagram of the Vtpa vector. 
Figure 4 is the VUb vector 

Figure 5 shows an optimized sequence of the HCV core 

10 antigen. 

Figure 6 shows VlRa.HCViCorePAb, Vtpa.HCVlCorePAh 
and VUb.HCVlCorePAb. 

Figure 7 shows the Hepatitis C Virus Core Antigen 

Sequence. 

15 Figure 8 shows codon utilization in human protein-coding 

sequences (from Lathe et al.). 

Figure 9 shows an optimized sequence of the HCV El 

protein. 

Figure 10 shows an optimized sequence of the HCV E2 

20 protein. 

Figure 1 1 shows an optimized sequence of the HCV E 1 -f E2 

proteins. 

Figure 12 shows an optimized sequence of the HCV NS5a 

protein. 

25 Figure 1 3 shows an optimized sequence of the HCV NS5b 

protein. 

DETAILED DESCRIPTION OF THE INVENTION 

This invention relates to novel formulations of nucleic acid 
30 pharmaceutical products, specifically nucleic acid vaccine products. 

The nucleic acid vaccine products, when introduced directly into muscle 
cells, induce the production of immune responses which specifically 
recognize Hepatitis C virus (HCV). 
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Non-A, Non-B hepatitis (NANBH) is a transmissible disease 
(or family of diseases) that is believed to be virally induced, and is 
distinguishable from other forms of virus-associated liver disease, such 
as those caused by hepatitis A virus (HAV), hepatitis B virus (HBV), 
5 delta hepatitis virus (HDV), cytomegalovirus (CMV) or Epstein-Barr 
virus (EBV). Epidemiologic evidence suggests that there may be three 
types of NANBH: the water-borne epidemic type; the blood or needle 
associated type; and the sporadically occurring (community acquired) 
type. However, the number of causative agents is unknown. Recently, a 

10 new viral species, hepatitis C virus (HCV) has been identified as the 

primary (if not only) cause of blood-associated NANBH (BB-NANBH). 
Hepatitis C appears to be the major form of transfusion-associated 
hepatitis in a number of countries, including the United States and 
Japan. There is also evidence implicating HCV in induction of 

15 hepatocellular carcinoma. Thus, a need exists for an effective method 
for preventing or treating HCV infection: currently, there is none. 

The HCV may be distantly related to the flaviviridae. The 
Flavivirus family contains a large number of viruses which are small, 
enveloped pathogens of man. The morphology and composition of 

20 Flavivirus particles are known, and are discussed in M. A. Brinton, in 
"The Viruses: The Togaviridae And Flaviviridae" (Series eds. Fraenkel- 
Conrat and Wagner, vol. eds. Schlesinger and Schlesinger, Plenum 
Press, 1986), pp, 327-374. Generally, with respect to morphology, 
Flavivimses contain a central nucleocapsid surrounded by a lipid 

25 bilayer. Virions are spherical and have a diameter of about 40-50 nm. 
Their cores are about 25-30 nm in diameter. Along the outer surface of 
the virion envelope are projections measuring about 5-10 nm in length 
with terminal knobs about 2 nm in diameter. Typical examples of the 
family include Yellow Fever virus, West Nile virus, and Dengue Fever 

30 virus. They possess positive-stranded RNA genomes (about 1 1 ,000 
nucleotides) that are slightly larger than that of HCV and encode a 
polyprotein precursor of about 3500 amino acids. Individual viral 
proteins are cleaved from this precursor polypeptide. 
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The genome of HCV appears to be single-stranded RN A 
containing about 10,000 nucleotides. The genome is positive-stranded, 
and possesses a continuous translational open reading frame (ORF) that 
encodes a polyprotein of about 3,000 amino acids. In the ORF, the 
5 structural proteins appear to be encoded in approximately the first 
quarter of the N-terminal region, with the majority of the polyprotein 
attributed to non-structural proteins. When compared with all known 
viral sequences, small but significant co-linear homologies are observed 
with the nonstructural proteins of the Flavivirus family, and with the 
10 pestiviruses (which are now also considered to be part of the Flavivirus 
family). 

Intramuscular inoculation of polynucleotide constructs, i.e., 
DNA plasmids encoding proteins have been shown to result in the 
generation of the encoded protein in situ in muscle cells. By using 

15 cDNA plasmids encoding viral proteins, both antibody and CTL 
responses were generated, providing homologous and heterologous 
protection against subsequent challenge with either the homologous or 
cross-strain protection, respectively. Each of these types of immune 
responses offers a potential advantage over existing vaccination 

20 strategies. The use of PNVs (polynucleotide vaccines) to generate 

antibodies may result in an increased duration of the antibody responses 
as well as the provision of an antigen that can have both the exact 
sequence of the clinically circulating strain of virus as well as the 
proper post-trans lational modifications and conformation of the native 

25 protein (vs. a recombinant protein). The generation of CTL responses 
by this means offers the benefits of cross-strain protection without the 
use of a live potentially pathogenic vector or attenuated virus. 

The standard techniques of molecular biology for 
preparing and purifying DNA constructs enable the preparation of the 

30 DNA therapeutics of this invention. While standard techniques of 
molecular biology are therefore sufficient for the production of the 
products of this invention, the specific constructs disclosed herein 
provide novel therapeutics which surprisingly produce cross-strain 
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protection, a result heretofore unattainable with standard inactivated 
whole virus or subunit protein vaccines. 

The amount of expressible DNA to be introduced to a 
vaccine recipient will depend on the strength of the transcriptional and 
5 translational promoters used in the DNA construct, and on the 
immunogenic! ty of the expressed gene product, hi general, an 
immunologically or prophylactically effective dose of about 1 |ig to 1 
mg, and preferably about 10 |ig to 300 ^ig is administered directly into 
muscle tissue. Subcutaneous injection, intradermal introduction, 
10 impression through the skin, and other modes of administration such as 
intraperitoneal, intravenous, or inhalation delivery are also 
contemplated. It is also contemplated that booster vaccinations are to be 
provided. 

The DNA may be naked, that is, unassociated with any 

15 proteins, adjuvants or other agents which impact on the recipients 
immune system, hi this case, it is desirable for the DNA to be in a 
physiologically acceptable solution, such as, but not limited to, sterile 
saline or sterile buffered saline. Alternatively, the DNA may be 
associated with surfactants, liposomes, such as lecithin liposomes or 

20 other hposomes known in the art, as a DNA-liposome mixture, (see for 
example WO93/24640) or the DNA may be associated with an adjuvant 
known in the art to boost immune responses, such as a protein or other 
carrier. Agents which assist in the cellular uptake of DNA, such as, but 
not limited to, calcium ions, detergents, viral proteins and other 

25 transfection facilitating agents may also be used to advantage. These 
agents are generally referred to as transfection facilitating agents and as 
pharmaceutically acceptable carriers. As used herein, the term gene 
refers to a segment of nucleic acid which encodes a discrete polypeptide. 
The term pharmaceutical, and vaccine are used interchangeably to 

30 indicate compositions useful for inducing immune responses. The terms 
construct, and plasmid are used interchangeably. The temi vector is 
used to indicate a DNA into which genes may be cloned for use 
according to the method of this invention. 
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The following examples are provided to further define the 
invention, without limiting the invention to the specifics of the 
examples. 

5 EXAMPLE 1 

VI J EXPRESSION VECTORS: 

VI J is derived from vectors VI and pUC18. a 
commercially available plasmid, VI was digested with Sspl and EcoRI 
restriction enzymes producing two fragments of DNA. The smaller of 

10 these fragments, containing the CMVintA promoter and Bovine Growth 
Hormone (BGH) transcription termination elements which control the 
expression of heterologous genes, was purified from an agarose 
electrophoresis geh The ends of this DNA fragment were then 
' blunted" using the T4 DNA polymerase enzyme in order to facilitate 

15 its ligation to another *'blunt-ended" DNA fragment. 

pUC18 was chosen to provide the "backbone" of the 
expression vector. It is known to produce high yields of plasmid, is 
well-characterized by sequence and function, and is of minimum size. 
We removed the entire lac operon from this vector, which was 

20 unnecessary for our purposes and may be detrimental to plasmid yields 
and heterologous gene expression, by partial digestion with the Haell 
restriction enzyme. The remaining plasmid was purified from an 
agarose electrophoresis gel, blunt-ended with the T4 DNA polymerase , 
treated with calf intestinal alkaline phosphatase, and ligated to the 

25 CMVintA/BGH element described above. Plasmids exhibiting either of 
two possible orientations of the promoter elements within the pUC 
backbone were obtained. One of these plasmids gave much higher 
yields of DNA in E. coli and was designated VIJ. This vectors 
structure was verified by sequence analysis of the junction regions and 

30 was subsequently demonstrated to give comparable or higher expression 
of heterologous genes compared with VI. The ampicillin resistance 
marker was replaced with the neomycin resistance marker to yield 
vector VlJneo. 
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An Sfi I site was added to VlJneo to facilitate integration 
studies. A commercially available 13 base pair Sfi I linker (New 
England BioLabs) was added at the Kpn I site within the BGH sequence 
of the vector, VlJneo was linearized with Kpn I, gel purified, blunted 
5 by T4 DNA polymerase, and ligated to the blunt Sfi I linker. Clonal 
isolates were chosen by restriction mapping and verified by sequencing 
through the linker. The new vector was designated VI Jns. Expression 
of heterologous genes in VlJns (with Sfi I) was comparable to 
expression of the same genes in VlJneo (with Kpn I). 

10 Vector VlRa (Sequence is shown in Figure 1; map is shown 

in Figure 2) was derived from vector VIR, a derivative of the VI Jns 
vector. Multiple cloning sites (Bfflll, Kpn\, EcoRW, EcoRl, Sail, and 
Notl) were introduced into VIR to create the VlRa vector to improve 
the convenience of subcloning. VI Ra vector derivatives containing the 

15 tpa leader sequence and ubiquitin sequence were generated (Vtpa 
(Figure 3) and Vub (Figure 4), respectively). Expression of viral 
antigen from Vtpa vector will target the antigen protein into the 
exocytic pathway, thus producing a secretable form of the antigen 
proteins. These secreted proteins are likely to be captured by 

20 professional antigen presenting cells, such as macrophages and dendritic 
cells, and processed and presented by class II molecules to activate CD4+ 
Th cells. They also are more hkely to efficiently simulate antibody 
responses. Expression of viral antigen through VUb vector will 
produce a ubiquitin and antigen fusion protein. The uncleavable 

25 ubiquitin segment (glycine to alanine change at the cleavage site, Butt et 
ah, JBC 263:16364, 1988) will target the viral antigen to ubiquitin- 
associated proteasomes for rapid degradation. The resulting peptide 
fragments will be transported into the ER for antigen presentation by 
class I molecules. This modification is attempted to enhance the class I 

30 molecule-restricted CTL responses against the viral antigen (Townsend 
et al, JEM 168:1211, 1988). 
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EXAMPLE 2 

DESIGN AND CONSTRUCTION OF THE SYNTHETIC GENES 

A. Design of Synthetic Gene Segments for HCV Gene Expression : 
5 Gene segments were converted to sequences having 

identical translated sequences (except where noted) but with alternative 
codon usage as defined by R. Lathe in a research article from 7. Molcc. 
Biol. VoL 183, pp. 1-12 (1985) entitled "Synthetic OHgonucleotide 
Probes Deduced from Amino Acid Sequence Data: Theoretical and 

10 Practical Considerations". The methodology described below was based 
on our hypothesis that the known inability to express a gene efficiently 
in mammalian cells is a consequence of the overall transcript 
composition. Thus, using alternative codons encoding the same protein 
sequence may remove the constraints on HCV gene expression. 

15 Inspection of the codon usage within HCV genome revealed that a high 
percentage of codons were among those infrequently used by highly 
expressed human genes. The specific codon replacement method 
employed may be described as follows employing data from Lathe et 
ah: 

20 1 . Identify placement of codons for proper open 

reading frame. 

2. Compare wild type codon for observed frequency of 
use by human genes (refer to Table 3 in Lathe et al.)- 

3. If codon is not the most commonly employed, 

25 replace it with an optimal codon for high expression based on data in 
Table 5. 

4. Inspect the third nucleotide of the new codon and the 
first nucleotide of the adjacent codon immediately 3*- of the first. If a 
5*-CG-3' pairing has been created by the new codon selection, replace it 

30 with the choice indicated in Table 5. 

5. Repeat this procedure until the entire gene segment 
has been replaced. 

6. Inspect new gene sequence for undesired sequences 
generated by these codon replacements (e.g., "ATTTA" sequences. 
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inadvertent creation of intrqn splice recognition sites, unwanted 
restriction enzyme sites, etc.) and substitute codons that eliminate these 
sequences. 

7. Assemble synthetic gene segments and test for 
5 improved expression. 

B. HCV CORE ANTIGEN SEQUENCE 

The consensus core sequence of HCV was adopted from a 
generalized core sequence reported by Bukh et al. (PNAS, 91:8239, 
10 1994). This core sequence contains all the identified CTL epitopes in 
both human and mouse. The gene is composed of 573 nucleotides and 
encodes 191 amino acids. The predicted molecular weight is about 23 
kDa. 

The codon replacement was conducted to eliminate codons 
15 which may hinder the expression of the HCV core protein in transfected 
mammalian cells in order to maximize the translational efficiency of 
DNA vaccine. Twenty three point two percent (23.2%) of nucleotide 
sequence (133 out of 573 nucleotides) were altered, resulting in changes 
of 61.3% of the codons (1 17 out 191 codons) in the core antigen 
20 sequence. The optimized nucleotide sequence of HCV core is shown in 
Figure 5. 

C. CONSTRUCTION OF THE SYNTHETIC CORE GENE 

The optimized HCV core gene (Figure 5) was constructed 
25 as a synthetic gene annealed from multiple synthetic oligonucleotides. 
To facilitate the identification and evaluation of the synthetic gene 
expression in cell culture and its immunogenicity in mice, a CTL 
epitope derived from influenza virus nucleoprotein residues 366-374 
and an antibody epitope sequence derived from SV40 T antigen residues 
30 684-698 were tagged to the carboxyl terminal of the core sequence 
(Figure 6). For clinical use it may be desired to express the core 
sequence without the nucleoprotein 366-374 and SV40 T 684-698 
sequences. For this reason, the sequence of the two epitopes is flanked 
by two EcoRl sites which will be used to excise this fragment of 
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sequence at a later time. Thus an embodiment of the invention for 
clinical use could consist of the VlRa.HCVlCorePAb, 
Vtpa.HCVlCorePAb, or VUb.HCViCorePAb plasmids that had been 
cut with EcoRl, annealed, and ligated to yield plasmids 
5 V 1 Ra.HC V 1 Core, Vtpa.HCV 1 Core, and VUb.HCV 1 Core. 

The synthetic gene was built as three separate segments in 
three vectors, nucleotides 1 to 80 in VlRa, nucleotides 80 to 347 {BstXl 
site) in pUC18, and nucleotides 347 to 573 plus the two epitope 
sequence in pUC18. All the segments were verified by DNA 
10 sequencing, and joined together in V I Ra vector. 

D. HCV Gene Expression Constructs: 

In each case, the junction sequences from the 5' promoter 
region (CMVintA) into the cloned gene is shown. The position at which 
15 the junction occurs is demarcated by a *V'\ which does not represent any 
discontinuity in the sequence. 

The nomenclature for these constructs follows the 
convention: "Vector name-HCV strain-gene'*. 

20 

VI Ra.HC Vl.CorePAb 
—IntA- AGA TOT ACC / ATG AGC -HCV.Core.- GCC / GAA TTC GCT TCC- 
PAb Sequence--TA A / ACC CGG GAA TTC TA A A / GTC GAC--BGH— 

25 Vtpa.HCV 1 .CorePAb 

— IntA-ATC ACC / ATG G AT--tpa leader-G AG ATC-TTC / ATG AGC- 
HCV.Core.-GCC / GAA TTC GCT TCC-PAb Sequence -TAA / ACC CGG GAA 
TTC TA A A / GTC G AC-BGH— 

30 VUb.HCV LCorePAb. 

— LitA-AGA TCC ACC / ATG CAG -Ubiquitin-GGT GCA GAT CTG/ ATG AGC- 
HCV.Core.--GCC / GAA TTC GCT TCC-PAb Sequence--TAA / ACC CGG GAA 
TTC TAA A / GTC G AC- BGH - 
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VlRa.HCVl.Core 
-IntA- AGA TCT ACC / ATG AGC--HCV.Core.--GCC / TAA A / GTC GAC-- 
BGH— 

5 Vtpa.HCVI.Core 

— lntA--ATC ACC / ATG GAT--tpa leadei--GAG ATC-TTC / ATG AGC- 
HCV.Core.- GCC / TAA A / GTC GAC--BGH -- 

VUb.HCVl.Core 

1 0 — lntA--AGA TCC ACC / ATG CAG-Ubiquitin-GGT GCA GAT CTG/ ATG AGC-- 
HCV.Core.--GCC / TAA A / GTC GAC--BGH— 



E. OTHER SYNTHETIC HCV GENES 

Using .similar codon optimization techniques, synthetic 
15 genes encoding the HCV El (Figure 9), HCV E2 (Figure 10), HCV 

E1+E2 (Figure 1 1), HCV NS5a (Figure 12) and HCV NS5b (Figure 13) 
proteins were created. 
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WHAT IS CLAIMED: 



1 . A synthetic polynucleotide comprising a DNA 
sequence encoding an HCV protein selected from the group consisting 

5 of HCV core protein, HCV El protein, HCV E1+E2 protein, HCV NS5a 
protein, HCV NS5b protein and fragments thereof, the DNA sequence 
comprising codons optimized for expression in a vertebrate host. 

2, A plasmid vector comprising the polynucleotide of 
10 Claim 1 , the plasmid vector being suitable for immunization of a 

vertebrate host. 
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3. The polynucleotide of Claim 
genotype I/la core. 



which is HCV 



4. The polynucleotide of Claim 1 having the sequence 
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25 6. The polynucleotide of Claim 4 from which the PAb 

sequence has been removed. 



7. The plasmid vector of Claim 5 from which the PAb 
sequence has been removed. 

8. A method for inducing immune responses in a 
vertebrate against HCV epitopes which comprises introducing between 
ng and 100 mg of the polynucleotide of Claim 1 into the tissue of the 
vertebrate. 

9. A method for inducing immune responses against 
infection or disease caused by HCV which comprises introducing into 
the tissue of a vertebrate the polynucleotide of Claim 1. 

40 10. A vaccine for inducing immune responses against 

HCV infection which comprises the polynucleotide of Claim I and a 
pharmaceutically acceptable carrier. 

11. A method for inducing anti-HCV immune responses 
45 in a primate which comprises introducing the polynucleotide of Claim 1 
into the tissue of said primate and concurrently administering 
interleukin-12 parenterally. 
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12. A method of inducing an antigen presenting cell to 
stimulate cytotoxic and helper T-cell proliferation an effector functions 
including lymphokine secretion specific to HCV antigens which 

5 comprises exposing cells of a vertebrate in vivo to the polynucleotide of 
Claim 1. 

13. A method of treating a patient in need of such 
treatment comprising administering to the patient the polynucleotide of 

10 Claim 1 in combination with interferon-alpha, Ribavirin, Zidovudine, 
or other pharmaceutically acceptable antiviral agents.. 

14. A pharmaceutical composition comprising the 
polynucleotide of Claim 1, 

15 

15. A method of inducing an immune response 
comprising administering the polynucleotide of Claim 1 to a patient, the 
administration of the polynucleotide antedating or coinciding or 
following administration to the patient of a subunit, recombinant, 

20 recombinant live vector, inactivated, recombinant inactivated vector, or 
live attenuated HCV vaccine. 

16. A method for inducing immune responses in a 
vertebrate against HCV epitopes which comprises introducing between 1 

25 ng and 100 mg of the polynucleotide of Claim 2 into the tissue of the 
vertebrate. 

17. A method for inducing immune responses against 
infection or disease caused by HCV which comprises introducing into 

30 the tissue of a vertebrate the polynucleotide of Claim 2. 

18. A vaccine for inducing immune responses against 
HCV infection which comprises the polynucleotide of Claim 2 and a 
pharmaceutically acceptable carrier. 
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19. A method for inducing anti-HCV immune responses 
in a primate which comprises introducing the polynucleotide of Claim 2 
into the tissue of said primate and concurrently administering 

5 interleukin 12 parenterally. 

20. A method of inducing an antigen presenting cell to 
stimulate cytotoxic and helper T-cell proliferation an effector functions 
including lymphokine secretion specific to HCV antigens which 

10 comprises exposing cells of a vertebrate in vivo to the polynucleotide of 
Claim 2. 

21 . A method of treating a patient in need of such 
treatment comprising administering to the patient the polynucleotide of 

15 Claim 2 in combination with interferon-alpha, Ribavirin, Zidovudine, 
or other pharmaceutical ly acceptable antiviral agents.. 

22. A pharmaceutical composition comprising the 
polynucleotide of Claim 2. 

20 

23. A method of inducing an immune response 
comprising administering the polynucleotide of Claim 2 to a patient, the 
administration of the polynucleotide antedating or coinciding or 
following administration to the patient of a subunit, recombinant, 

25 recombinant live vector, inactivated, recombinant inactivated vector, or 
hve attenuated HCV vaccine. 

24. The vector of Claim 2 which is selected from 
VlRa.HCVlCorePAb, Vtpa.HCVlCorePAb, VUb.HCV ICorePAb, 

30 V 1 Ra.HCV 1 Core, Vtpa.HCV 1 Core and VUb.HCV 1 Core. 

25. A pharmaceutical composition comprising the vector 

of Claim 21 . 
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26. The DNA sequence of Claim 1 selected from the 
group consisting of a nucleotide sequence shown in Figure 5, Figure 9, 
Figure 10, Figure 1 1, Figure 12 and Figure 13. 
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FIG.2 
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Xho I 3494 
ClQ 1 3402 



Spe I 103 



Sma I 3220 
Sspl 3169 

Hind III 2974 /5: 




Nde I 337 
SnoB I 442 
Nco I 464 



ApoL I 230 

Drd 1 2093 



Sfi I 1975 
Pac I 1967 



Sac 1 1 756 
-BstXI836 



Sop I 1215 
Pvu 1 1 1467 
Hpo 1 1521 

Sea I 1551 

Nco 1 1616 

Psl I 1629 

ATA ACC ATC GAT GCA 
ATG AAG AGA GGG CTC 
TGC TGT GTG CTG CTG 
CTG TGT GGA GCA GTG 
TTC GTT TOG COG AGC 
GAG ATC T 

Bgl I1 1676 

Kpn I 1683 
EcoRV1690 

EcoR 1 1697 
Sol 1 1704 
Not 1 1711 



FIG.3 
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FIG.4 
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CODON UTIUZATION IN HUMAN PROTEIN -CODING SEQUENCES 
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TOTAL 4285 RESIDUES EXCLUDING 
N-TERMINAL METHIONINE RESIDUES 
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THERMOSTABLE LUCIFERASES AND METHODS OF PRODUCTION 

The government may have rights to this invention bsised on support 
provided by NIH 1R43 GM506 23-01 and 2R44 GM506 23-02 and NSF ISI- 
9160613 and 111-9301865. 

RELATED APPLICATIONS 

This application claims priority from copending U.S. Ser. No. 60/059,379 
filed September 19, 1997. 

FIELD OF THE INVENTION 

The invention is directed to mutant luciferase enzymes having greatly 
increased thermostability compared to natural luciferases or to luciferases from 
which they are derived as measured e.g. by half-lives of at least 2 hrs. at 50°C in 
aqueous solution. The invention is also drawn to polynucleotides encoding the 
novel luciferases, and to hosts transformed to express the luciferases. The 
invention is further drawn to methods of producing luciferases with increased 
thermostability and the use of these luciferases in any method in which previously 
known luciferases are conventionally employed. Some of the uses employ kits. 

BACKGROUND OF THE INVENTION 

Luciferases are defined by their ability to produce luminescence. Beetle 
luciferases form a distinct class with unique evolutionary origins and chemical 
mechanisms. (Wood, 1995) 

Although the enzymes kno\yn as beetle luciferases are widely recognized 
for their use in highly sensitive luminescent assays, their general utility has been 
limited due to low thermostability. Beetle luciferases having amino acid 
sequences encoded by cDNA sequences cloned from luminous beetles are not 
stable even at moderate temperatures. For example, even the most stable of the 
luciferases, LucP/?e2, obtained from a firefly has very little stability at the 

1 
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moderate temperature of 37° C. Firefly luciferases are a sub-group of the beetle 
luciferases. Historically, the term "firefly luciferase" referred to the enzyme 
LucPpy from a single species Photinus pyralis (Luc + is a version). 

Attempts have been reported to mutate natural cDNA sequences encoding 
5 luciferase and to select mutemts for improved thermostablity (White et al., 1994; 

from P. pyralis and Kajiyama and Nekano, 1993, from Luciola lateralis,) 
However, there is still a need to improve the characteristics and versatility of this 
important class of enzymes. 

SUMMARY OF THE INVENTION 

10 The invention is drawn to novel and remarkably thermostable luciferases, 

including half-lives of at least 2 hrs. at 50°C or at last 5 hrs. at 50°C in aqueous 
solution. The mutant luciferases of the present invention display remarkable and 
heretofore unrealized thermostability at room temperature (22*^C) and at 
temperatures at least as high as 65**C, The invention is further directed to the 

15 mutant luciferase genes (cDNA) which encode the novel luciferase enzymes. The 

terminology used herein is, e.g. for the mutants isolated in experiment 90, plate 
number 1, well B5, the E. coli strain is 90-1B5, the mutant gene is Iuc90-JB5, and 
the mutated luciferase is LucP0-75J. 

By thermostability is meant herein the rate of loss of enzyme activity 

20 measured at half life for an enzyme in solution at a stated temperature. Preferably, 

for beetle luciferases, enzyme activity means luminescence meeisured at room 
temperature under conditions of saturation with luciferin and ATP. 
Thermostability is defined in terms of the half-life (the time over which 50% of 
the activity is lost). 

25 The invention further encompasses expression vectors and other genetic 

constructs containing the mutant luciferases, as well as hosts, bacterial and 
otherwise, transformed to express the mutant luciferases. The invention is also 
drawn to compositions and kits which contain the novel luciferases, and use of 
these luciferases in any methodology where luciferases are conventionally 

30 employed. 
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Various means of random mutagenesis were applied to a luciferase gene 
(nucleotide sequence), most particularly gene synthesis using an error-prone 
polymerase, to create libraries of modified luciferase genes. This library was 
expressed in colonies of E. coli and visually screened for efficient luminescence to 
select a subset library of modified luciferases. Lysates of these E. coli strains were 
then made, and quantitatively measured for luciferase activity and stability. From 
this, a smaller subset of modified luciferases was chosen, and the selected 
mutations were combined to make composite modified luciferases. New libraries 
were made from the composite modified luciferases by random mutagenesis and 
the process was repeated. The luciferases with the best overall performance were 
selected after several cycles of this process. 

Methods of producing improved luciferases include directed evolution 
using a polynucleotide sequence encoding a first beetle luciferase as a starting 
(parent) sequence, to produce a polynucleotide sequence encoding a second 
luciferase with increased thermostability, compared to the first luciferase, while 
maintaining other characteristics of the enzymes. A cDNA designated lucppe2 
encodes a firefly luciferase derived from Photuris pennsylvanica that displays 
increased thermostability as compared to the widely utilized luciferase designated 
iMoPpy from Photinus pyralis. The cDNA encoding LucPpe2 luciferase was 
isolated, sequenced and cloned (see Leach, et al. 1997). A mutant of this gene 
encodes a first luciferase LucPpe2 [T249M], 

In an embodiment of a mutant luciferase, the amino acid sequence is that of 
LucPpe2 shown in FIG. 45 with the exception that at residue 249 there is a T 
(designated T249 M) rather than the M reported by Leach et al. The bold, 
underlined residue (249) shows mutation from T to M. This enzyme produced 
approximately 5-fold more light in vivo when expressed in E. coli. Double- 
underlined residues were randomized by oligonucleotide mutagenesis. 

Diluted extracts of recombinant £. coli that expressed mutant lucifersises 
made by the methods of the invention were simultaneously screened for a plurality 
of characteristics including light intensity, signal stability, substrate utilization 
(Km), and thermostability. A fully automated robotic system was used to screen 
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large numbers of mutants in each generation of the evolution. After several cycles 
of mutagenesis and screening, thereby creating mutant libraries of luciferases, £in 
increased thermostability compared to LucPpe2 [T249M] of about 35°C was 
achieved for the most stable clone (clone Luc90-1B5] which also essentially 
5 maintained thermostability (there was only negligible loss in activity of 5%) when 

kept in aqueous solution over 2 hrs. at 50°C, 5 hours at 65°C, or over 6 weeks at 
22°C. 

Mutant luciferases of the present invention display increased 
thermostability for at least 2 hrs. at 50°C, preferably at least 5 hrs. at 50X in the 

10 range of 2-24 hrs. at 50°-65°C. In particular, the present invention comprises 

thermostable mutant luciferases which, when solubilized in a suitable aqueous 
solution, have a stability half-life greater than about 2 hours at about 50°C, more 
preferably greater than about 10 hours at 50^C, and more preferably still greater 
than 5 hours at 50°C. The present invention also comprises mutant luciferases 

15 which, when solubilized in a suitable aqueous solution, have a stability half-life 

greater than about 5 hours at about 60°C, more preferably greater than about 1 0 
hours at about 60°C, and more preferably still greater than about 24 hours at about 
60°C. The present invention further comprises mutant luciferases which when 
solubilized in a suitable aqueous solution have a stability half-life greater than 

20 about 3 months at about 22°C, and more preferably a half-life stability of at least 6 

months at 22°C. An embodiment of the invention is a luciferase mutant having 
stability 6 hours at 65°C (equivalent to a half-life of 2 days). A loss of activity of 
about 5-6% was found. The half-lives of enzymes from the most stable clones of 
the present invention, extrapolated from data showing small relative changes, is 2 

25 days at 65°C (corresponding to 6% loss over 6 hours), and 2 years at 22^C 

(corresponding to 5% loss over 6 weeks). 

In particular, the invention comprises luciferase enzymes with 
embodiments of amino acid sequences disclosed herein, (e.g. mutant luciferases 
designated Luc49'7C6; Luc75-0570; and Luc90-755, FIGS. 27, 36, 43) as well as 

30 all other beetle luciferases that have thermostability as measured in half-lives of at 

4 
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least 2 hours at 50°C. The invention also comprises mutated polynucleotide 
sequences encoding luciferase enzymes containing any single mutation or any 
combination of mutations of the type and positions in a consensus region of beetle 
lucifereise encoding sequences, disclosed herein, or the equivalents. The mutations 

5 are indicated in the sequences in FIGS. 22-47 by bold, underlined residues and are 

aligned with other beetle luciferase sequences in FIG. 19. 

Nucleotide sequences encoding beetle luciferases are aligned in FIG. 19. 
Eleven sequences found in nature in various genera and species within genera are 
aligned, including Iucppe-2, Nucleotide sequences encoding three mutant 

10 luciferases of the present invention (Luc4P- 78-OBlO; 90-185) are also 

aligned. There are at least three mutations in each mutant luciferase that show 
increased thermostability. In general, mutations are not in the conserved regions. 
Conserved amino acids sire those that aire identical in all natural species at 
positions shown in FIG. 19. Consensus refers to the same amino acid occurring at 

15 more than 50% of the sequences shown in FIG. 19, excluding LucPpe2, 

DETAILED DESCRIPTION OF THE INVENTION 

The invention relates beetle luciferases that are characterized by high 
thermostability and are created by mutations made in the encoding genes, 
generally by recursive mutagenesis. The improved thermostability allows storage 

20 of luciferases without altering its activity, and improves reproducibility and 

accuracy of assays using the new luciferases. The invention further comprises 
isolated polynucleotide sequences (cDNAs) which encode the mutant luciferases 
with increased thermostability, vectors containing the polynucleotide sequences, 
and hosts transformed to express the polynucleotide sequences. Table I shows 

25 results of about 250 clones and characteristics of the luciferases from the clones 

including thermostability. The invention also encompasses the use of the mutant 
luciferases in any application where luciferases £ire conventionally utilized, and 
kits useful for some of the applications. 

Unexpectedly, beetle luciferases with the sought after high thermostability 

30 were achieved in the present invention through a process of recursive mutagenesis 

5 
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and selection (sometimes referred to as "directed evolution"). A strategy of 
recursive mutagenesis and selection is an aspect of the present invention, in 
particular the use of a multi-parameter automated screens. Thus, instead of 
screening for only a single attribute such as thermostability, simultaneous 
5 screening was done for additional characteristics of enzyme activity and 

efficiency. By this method, one property is less likely to "evolve" at the expense 
of another, resulting in increased thermostability, but decreased activity, for 
example. 

Table 1 presents examples of parameter values (Li, Tau, and S) derived 

10 from experiments using different luciferases as starting (parent) sequences. The 

subtitles refer to designations of the starting temperature at which the parameters 
were measured and the starting luciferase, e.g., 39-5B10 at 51°C" and so forth. 
All parameters in each experiment are recorded as relative values to the respective 
starting sequence, e.g., the parameter values for the starting sequence in any 

15 experiment equal "1 (See Example 2 herein for definitions.) 

Thermostability has evolved in nature for various enzymes, as evidenced 
by thermostable isozymes found in thermophilic bacteria. Natural evolution 
works by a process of random mutagenesis (base substitutions, gene deletions, 
gene insertions), followed by selection of those mutants with improved 

20 characteristics. The process is recursive over time. Although the existence of 

thermostable enzymes in nature suggests that thermostability can be achieved 
through mutagenesis on an evolutionary scale, the feasibility of achieving a given 
level of thermostability for a particular class of enzymes by using short term 
laboratory methods was unpredictable. The natural process of evolution, which 

25 generally involves extremely large populations and many millions of generations 

and genes, by mutation and selection cannot be used to predict the capabilities of a 
modem laboratory to produce improved genes by directed evolution until such 
mutants are produced. 

After such success, since the overall three-dimensional structure of all 

30 beetle luciferases are quite similar, having shown it possible for one member of 

this class makes it predictable that high thermostability can be achieved for other 

6 
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beetle luciferases by similar methods. FIG. 17 shows evolutionary relationship 
among beetles luciferases. All of these have a similar overall architecture. The 
structural class to which the beetle luciferases belong is determined by the 
secondary structure (e.g. helices are symbolized by cylinders, sheets by collections 

5 of arrows, loops connect helices with sheets (FIG. 18A). FIG. 18B shows the 

amino acids of the LucPpe2 luciferase (FIG. 1 8B) wherein small spirals 
correspond to cylinders of FIG. 18A; FIG 18C shows that the general beetle 
architecture matches (is superimposed on) that of LucPpe2. This is support for the 
expectation that the methods of the present invention may be generalized to all 

10 beetles luciferases: 

Enzymes belong to different structural classes based on the three- 
dimensional arrangement of secondary elements such as helices, sheets, and loops. 
Thermostability is determined by how efficiently the secondary elements are 
packed together into a three-dimensional structure. For each structural class, there 

15 also exists a theoretical limit for thermostability. All beetle luciferases belong to a 

common structural class as evident by their common ancestry (FIG. 17), 
homologous amino acid sequences, and common catalytic mechanisms. 

The application of a limited number of amino acid substitutions by 
mutagenesis is unlikely to significantly affect the overall three-dimensional 

20 architecture (/.e., the structural class for mutant luciferases is not expected to 

change.) Because the theoretical limit for thermostability for any structural class 
is not known, the potential thermostability of beetle luciferases was not known 
until demonstrations of the present invention. 

A priori difficulties in achieving the goals of the present invention 

25 included: 

1 . The types of mutations which can be made by laboratory methods 
are limited. 

i) By random point mutation (e.g. by error-prone PGR), more than one 
base change per codon is rare. Thus, most potential amino acid 
30 changes are rare. 



7 
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ii) Other types of random genetic changes are difficult to achieve for 
areas greater than 100 bp (e.g., random gene deletions or 
insertions). 

2. The number of possible luciferase mutants that can be screened is 

5 limited. 

i) Based on sequence comparisons of natural luciferases, ignoring 
deletions and insertions, more than 10^*^ functional enzyme 
sequences may be possible. 

ii) If 100,000 clones could be screened per day, it would require 

10 more than 10*^^ centuries to screen all possible mutants assuming 

same mutant was never screened twice (actual screening rate for 
the present invention was less than 5000 per day). 

3. The probability of finding functional improvement requiring 
cooperative mutations is rare (the probability of finding a specific cooperative pair 

15 is 1 out of 108 clones). 

Thus, even if the theoretical limits of thermostability were known, since 
only a very small number of the possible luciferase mutants can be screened, the a 
priori probability of finding such a thermostable enzyme was low. 

However, the present invention now shows that it is possible and feasible to 
20 create novel beetle luciferases having high thermostability. 

a) The approximately 250 mutants produced by methods of the present 
invention wherein the initial sequence was from LucPpe2 and 
LucPpe demonstrate that it is possible and feasible for at least one 
member of this enzyme class to achieve high thermostability, 
25 b) Any beetle luciferase should be improved by similar means since 

the luciferases belong to the same structural class, 
i) Since all beetle luciferases belong to the same structural 
class, they also share in the same pool of potentially 
stabilizing mutations (this conclusion is supported by 
30 observation that a high percentage of the stabilizing 

mutations found in the clones of the present invention were 
8 
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conversions to "consensus amino acids" in other beetle 
luciferases that is, amino acids that appear in the majority of 
beetle luciferase sequences (see FIG. 19). 
ii) Similar results were achieved using another beetle luciferase 
5 from the luminous beetle Pyrophorus plagiophihalamus 

(LucPpIYG). The wild-type LucPplYG has 48% sequence 
identity to the wild type LucPpe2. Although the 
thermostability of the LucPplYG mutants were less than the 
LucPpe2 mutants described herein, this is because they were 
10 subjected to fewer cycles of directed evolution. Also, in 

some instances, mutants were selected with less emphasis 
placed on their relative thermostability. The most stable 
clone resulting from this evolution {Luc80-5E5) has a half- 
life of roughly 3.8 hours at 50°C. 
15 To compensate for a statistical effect caused by the large number of 

deleterious random mutations expected relative to the beneficial mutations, 
methods were employed to maximize assay precision and to re-screen previously 
selected mutations in new permutations. Among the methods for maximizing 
assay precision were closely controlling culture conditions by using specialized 
20 media, reducing growth rates, controlling heat transfer, and analyzing parameters 

from mid-logarithmic phase growth of the culture, controlling mixing, heat 
transfers, and evaporation of samples in the robotic screening process; and 
normalizing data to spatially distributed control samples. New permutations of the 
selected mutations were created by a method of DNA shuffling using proof- 
25 reading polymerases. 

The difficulty in predicting the outcome of the recursive process is 
exemplified by the variable success with the other characteristics of luciferase that 
were also selected for. Although the primary focus was on the enzyme 
thermostability, selection for mutants with brighter liiminescence, more efficient 
30 substrate utilization, and an extended luminescence signal was also attempted. 

The definitions are given by equations herewith. The selection process was 

9 
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determined by changes relative to the parent clones for each iteration of the 
recursive process. The amount of the change was whatever was observed during 
the screening process. The expression of luciferase in E. coli was relatively 
inefficient, for LucP/?e2, compared to Luc +. Other luciferases varied (see 
5 Fig. 21). 

To improve the overall efficiency of substrate utilization, reduction in the 
composite apparent utilization constant (i.e., Km-[ATP+luciferin]) for both 
luciferin and ATP was sought. Although there was an unexpected systematic 
change in each utilization constant, there was little overall change. Finally, the 

10 luminescence signal could only be moderately affected without substantially 

reducing enzyme efficiency. Thus, while the enzyme thermostability was greatly 
increased by methods of the present invention, other characteristics of the enzyme 
were much less affected. 

FIGS. 48-53 present other results of the mutant luciferases. Compositions 

15 of the invention include luciferases having greater than the natural level of 

thermostability. Each mutant luciferase is novel, because its individual 
characteristics have not been reported. Specific luciferases are known by both 
their protein and gene sequences. Many other luciferases were isolated that have 
increased, high thermostability, but whose sequences are not known. These 

20 luciferases were identified during the directed evolution process, and were 

recognized as distinct by their enzymological characteristics. 

A luciferase which is much more stable than any of the luciferase mutants 
previously described is designated as mutant Luc 90-185. New thermostable 
mutants were compared to this particularly stable luciferase. The mutant 

25 luciferases of the present invention display remarkable and heretofore unrealized 

thermostability at temperatures ranging from 22°C (room temperature) to at least 
as high as 65*^0. 

Other aspects of the invention include methods that incorporate the 
thermostable luciferases, specifically beetle luciferases having high 
30 thermostability. 

10 
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Production of Luciferases of the Present Invention 

The method of making luciferases with increased thermostability is 
recursive mutagenesis followed by selection. Embodiments of the highly 
thermostable mutant luciferases of the invention were generated by a reiterative 
5 process of random point mutations beginning with a source nucleotide sequence, 

e.g. the cDNA LucPpe2 [T249M] cDNA. Recombination mutagenesis is a part of 
the mutagenesis process, along with point mutagenesis. Both recombination 
mutagenesis and point mutagenesis are performed recursively. Because the 
mutation process causes recombination of individual mutants in a fashion similar 

10 to the recombination of genetic elements during sexual reproduction, the process is 

sometimes referred to as the sexual polymerase chain reaction (sPCR). See, for 
instance, Stemmer, U.S. Patent No. 5,605,793, issued February 25, 1997. 

Taking the LucPpe2 luciferase cDN A sequence as a starting point, the 
gene was mutated to yield mutant luciferases which are far more thermostable. A 

15 single point mutation to the LucPpe2 sequence yielded the luciferase whose 

sequence is depicted as T249M. This mutant is approximately 5 times brighter in 
vivo than that ofLucPpe2, it was utilized as a template for further mutation. It 
was also used a baseline for measuring the thermostability of the other mutant 
luciferases described herein. 

20 

Embodiments Of Sequences Of Luciferases Of The Present Invention 

FIG. 45 shows the amino acid sequence of the LucPpe2 luciferase. 
T249M. The sequence contains a single base pair mutation at position T249 to M 
(bold, underlined) which distinguishes it from the sequence reported by Leach et 

25 aL, (1997). This clone has a spectral maximum of 552 nm, which is yellow 

shifted from that of the Luc of Leach. This mutant ^yas selected for use as an 
original template in some of the Examples because it is approximately 5 times 
brighter in vivo, than the form repeated by Leach ei al which allowed for more 
efficient screening by the assay. These sequences show changes from the starting 

30 sequence (T249-M) in bold face. Note that "x" in the sequence denotes an 

ambiguity in the sequence. 

11 



wo 99/14336 



PCT/US98/19494 



Directed Evolution. A Recursive Process 

Directed evolution is a recursive process of creating diversity through 
mutagenesis and screening for desired changes. For enzymological properties that 
5 result from the cumulative action of multiple amino acids, directed evolution 

provides a means to alter these properties. Each step of the process will typically 
produce small changes in enzyme function, but the cumulative effect of many 
rounds of this process can lead to substantial overall change. 

The characteristic, "thermostability" is a candidate for directed evolution 

10 because it is determined by the combined action of many of the amino acids 

making up the enzyme structure. To increase the thermostability of luciferase, 
luminescence output and efficiency of substrate binding were also screened. This 
was to ensure that changes in thermostability did not also produce undesirable 
changes in other important enzymological properties. 

15 Because the frequency of deleterious mutations is much greater than useful 

mutations, it is likely that undesirable clones are selected in each screen within the 
precision limits of the present invention. To compensate for this, the screening 
strategy incorporated multiple re-screens of the initially selected mutations. 
However, before re-screening, the selected mutations were "shuffled" to create a 

20 library of random intragenetic recombinations. This process allows beneficial 

mutations among different clones to be recombined together into fewer common 
coding sequences, and unlinks deleterious mutations to be segregated and omitted. 
Thus, although essentially the same set of selected mutations was screened again, 
they were screened under different permutations as a result of the recombination 

25 or shuffling. 

Although results of each step of the evolutionary process were assayed by 
quantitative measurements, these measurements were mutually made in cell 
lysates rather than in purified enzymes. Furthermore, each step only measured 
changes in enzyme performance relative to the prior step, so global changes in 

30 enzyme function were difficult to judge. To evaluate the impact of directed 

evolution on enzyme function, clones from the beginning, middle and end of the 
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process (Table 2) were purified and analyzed. The clones selected for this analysis 
were Luc[T249M], 49-7C6, and 78-OBlO. Another clone, 90-1B5, created by a 
subsequent strategy of oligonucleotide-directed mutagenesis and screening was 
also purified for analysis. 

5 The effect of directed evolution on thermostability was dramatic. At high 

temperatures, where the parent clone was inactivated almost instantaneously, the 
mutant enzymes from the related clones showed stability over several hours (Table 
1), Even at room temperature, these mutants are several fold more stable than the 
parent enzyme. Subsequent analysis of 90-1 B5 showed this enzyme to be the 

10 most stable, having a half-life of 27 hours at 65*^C when tested under the same 

buffer conditions. With some optimization of buffer conditions, this enzyme 
showed very little activity loss at 65°C over several hours (citrate buffer at pH 6.5; 
FIG. 1 A). This luciferase was stable at room temperature over several weeks 
when incubated at pH 6.5 (FIG. IB). 

15 Kajiyama and Nakamo (1 993) showed that firefly luciferase from Luciola 

lateralis was made more stable by the presence of a single amino acid substitution 
at position A21 7; to either I, L, or V. The substitution was from alanine. 
Substitution with leucine produced a luciferase that maintained 70% of its activity 
after incubation for 1 hour at 50°C. All of the enzymes of the present invention 

20 created through directed evolution, are much more stable than this L. lateralis 

mutant. The most stable clone, 90-1 B5, maintains 75% activity after 120 hours (5 
days) incubation under similar conditions (50°C, 25mol/L citrate pH 6.5, 150 
mmol/L NaCl, Img/mL BSA, 0. Immol/L EDTA, 5% glycerol). Interestingly, the 
Luc reported by Leach already contains isoleucine at the homologous position 

25 described for the L lateralis mutant. 

Although thermostability was the characteristic of interest, clones were 
selected based on the other enzymological parameters in the screens. By selecting 
clones having greater luminescence expression, mutants were found that yielded 
greater luminescence intensity in colonies of E. coli. However, the process 

30 showed little ability to alter the kinetic profile of luminescence by the enzymes. 

This failure suggests that the ability to support steady-state luminescence is 

13 
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integral to the catalytic mechanism, and is not readily influenced by a cumulative 
effect of many amino acids. 

Substrate binding was screened by measuring an apparent composite 
(see Example 2) for luciferin and ATP. Although the apparent composite 
5 remained relatively constant, later analysis showed that the individual K^'s 

systematically changed. The for luciferin rose while the for ATP declined 
(Table 2). The reason for this change is unknown, although it can be speculated 
that more efficient release of oxyluciferin or luciferin inhibitors could lead to more 
rapid enzyme turnover. 

10 Each point mutation on its own increases (to a greater or lesser extent) the 

thermostability of the mutant enzyme beyond that of the wild-type luciferase. The 
cumulative effect of combining individual point mutations yields mutant 
luciferases whose thermostability is greatly increased from the wild-type, often on 
the order of a magnitude or more. 

15 

EXAMPLES 

The following examples illustrate the methods and compositions of the 
present invention and their embodiments. 

20 EXAMPLE 1: Producing Thermostable Luciferases Of The Present 

Invention 

Mutagenesis Method : 

An illustrative mutagenesis strategy is as follows: 
From the "best" luciferase clone, that is a clone with improved 
25 thermostability and not appreciably diminished values for other parameters, 

random mutagenesis was performed by three variations of error-prone PCR. From 
each cycle of random mutagenesis, 1 8 of the best clones were selected. DNA was 
prepared from these clones yielding a total of 54 clones. These clones represent 
new genetic diversity. 
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These 54 clones were combined and recombination mutagenesis was 
performed. The 18 best clones from this population were selected. 

These 18 clones were combined with the 18 clones of the previous 
population and recombination mutagenesis was performed. From this screening, a 
5 new luciferase population of 18 clones was selected representing 6 groups of 

functional properties. 

In this screening the new mutations of the selected 54 clones, either in their 
original sequence configurations or in recombinants thereof, were screened a 
second time. Each mutation was analyzed on the average about 10 times. Of the 
10 90 clones used in the recombination mutagenesis, it was likely that at least 10 

were functionally equivalent to the best clone. Thus, the best clone or 
recombinants thereof should be screened at least 100 times. Since this was greater 
than the number of clones used in the recombination, there was significant 
likelihood of finding productive recombination of the best clone with other clones. 

15 Robotic Processing Methods : 

Heat transfers were controlled in the robot process by using thick 
aluminum at many positions where the 96-well plates were placed by the robotic 
arm. For example, all shelves in the incubators or refrigerator were constructed 
from Va inch aluminum. One position in particular, located at room temperature, 

20 was constructed from a block of aluminum of dimensions 4.5 x 7 x 6.5 inches. 

When any 96-well plate was moved from a high temperature (e.g, incubators) or 
low temperature (e.g., refrigerator) to a device at room temperature, it was first 
placed on the large aluminum block for temperature equilibration. By this means, 
the entire plate would rapidly reach the new temperature, thus minimizing unequal 

25 evaporation for the various wells in the plate due to temperature differences. Heat 

transfers in a stack of 96-well plates placed in an incubator (e.g., for overnight 
growth of E. coli) were controlled by placing 1 mm thick sheets of aluminum 
between the plates. This allowed for more efficient heat transfer from the edges of 
the stack to the center. Mixing in the robotic process was controlled by having the 

30 plate placed on a shaker for several second after each reagent addition. 

15 
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Please refer to FIG. 14 for a schematic of the order in which the plates are 
analyzed (FIG. 1 5) and a robotic apparatus which can be programmed to perform 
the following functions: 

5 Culture Dilution Method. A plate (with lid) containing cells is placed on a shaker 

and mixed for 3-5 minutes. 

A plate (with lid) is gotten from a carousel and placed in the reagent 
dispenser. 1 80 |il of media is added after removing the lid and placing on the 
locator near the pipetter. The plate is then placed in the pipetter. 

10 The plate on the shaker is placed in the pipetter, and the lid removed and 

placed on the locator. Cells are transferred to the new plate using pipetting 
procedure (see "DILUTION OF CELLS INTO NEW CELL PLATE"). 

The lids are replaced onto both plates. The new plate is placed in the 
refrigerator and the old plate is returned to the carousel. 

15 Luminescence Assay Method. A plate containing cells is retrieved from the 

carousel and placed on the shaker for 3-5 minutes to fully mix the cells, the 
cells tend to settle from solution upon standing. 

To measure Optical Density (O.D.), the plate is moved from the shaker to 
the locator near the luminometer; the lid is removed and the plate placed into the 
20 luminometer. The O.D. is measured using a 620 nm filter. 

When it is finished, the plate is then placed in the refrigerator for storage. 
The above steps are completed for all plates before proceeding with 
subsequent processing. 

To prepare a cell lysate, the plate of cells is first retrieved from the 
25 refrigerator and mixed on the shaker to resuspend the cells. A new plate from the 

carousel without a lid is placed in the reagent dispenser and 20 \i\ of Buffer A is 
added to each well. This is placed in the pipetting station. 

The plate of cells in the shaker is placed in the pipetting station. A 
daughter plate is prepared using pipetting procedure (see "PIPETTING CELLS 
30 INTO THE LYSIS PLATE") to prepare a daughter plate of cells. 

16 
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After pipetting, the new daughter plate is placed on the shaker for mixing. 
The plate is returned to its original position in the carousel. 

After mixing, the Lysate Plate is placed into the CO2 freezer to freeze the 
samples. The plate is then moved to the thaw block to thaw for 10 minutes. 
5 The plate is then moved to the reagent dispenser to add 175 \i\ of Buffer B, 

and then mixed on the shaker for about 1 5 minutes or more. The combination of 
the freeze/thaw and Buffer B will cause the cells to lyse. 

A new plate with a lid from the carousel is used to prepare the dilution 
plate from which all assays will be derived. The plate is placed in the reagent 
10 dispenser and the lid removed to the locator near the pipetter. 285 p.! of Buffer C 

is added to each well with the reagent dispenser, then the plate is placed in the 
pipetting station. 

The Lysate Plate in the shaker is moved to the pipetting station and 
pipetting procedure (see "DILUTION FROM LYSIS PLATE TO INCUBATION 

15 PLATE") is used. After pipetting, the new daughter plate is placed on the shaker 

for mixing. The Lysate Plate is discarded. 

Two white assay plates are obtained from the plate feeder and placed in the 
pipetter. The incubation plate from the shaker is placed in the pipetter, and the lid 
removed and placed on the nearby locator. Two daughter plates are made using the 

20 pipetting procedure (see CREATE PAIR OF DAUGHTER PLATES FROM 

INCUBATION PLATE"). Afterwards, the lid is replaced on the parent plate, and 
the plate is placed in a high temperature incubator, [ranging from 3 1 to about 65°. 
depending on the clone.] 

One daughter plate is placed in the luminometer and the Ix ASSAY 

25 METHOD is used. After the assay, the plate is placed in the ambient incubator, 

and the second daughter plate is placed in the luminometer. For the second plate, 
the 0.02x ASSAY METHOD is used. This plate is discarded, and the first plate is 
returned from the incubator to the luminometer. The REPEAT ASSAY method is 
used (i.e., no reagent is injected). Afterwards, the plate is again returned to the 

30 ambient incubator. 
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The above steps are completed for all plates before proceeding with 
processing. 

To begin the second set of measurements, the plate from the high 
temperature incubator is placed in the shaker to mix. 
5 The plate in the ambient incubator is returned to the luminometer and the 

REPEAT ASSAY method is again used. The plate is returned afterwards to the 
ambient incubator. 

Two white assay plates again are obtained from the plate feeder and placed 
in the pipetter. The plate on the shaker is placed in the pipetter, and the lid 
10 removed and placed on the nearby locator. Two daughter plates are again made 

using the pipetting procedure (see "CREATE PAIR OF DAUGHTER PLATES 
FROM INCUBATION PLATE"). Afterwards, the lid is replaced on the parent 
plate, and the plate is returned to the high temperature incubator. 

One daughter plate is placed in the luminometer and the Ix ASSAY 
15 METHOD is again used. The plate is discarded after the assay. The second 

daughter plate is then placed in the luminometer and the 0.06x ASSAY METHOD 
is used. This plate is also discarded. 

The above steps are completed for all plates before proceeding with 
processing. 

20 In the final set of measurements, the plate from the high temperature 

incubator is again placed in the shaker to mix. 

The plate in the ambient incubator is returned to the luminometer and the 
REPEAT ASSAY method is again used. The plate is discarded afterwards. 
One white assay plate is gotten from the plate feeder and placed in the 
25 pipetter. The plate from the shaker is placed in the pipetter, and the lid removed 

and placed on the nearby locator. One daughter plate is made using the pipetting 
procedure (see "CREATE SINGLE DAUGHTER PLATE FROM INCUBATION 
PLATE"). The lid is replaced on the parent plate and the plate is discarded. 
The daughter plate is placed in the luminometer and the Ix ASSAY 
30 METHOD is used. The plate is discarded after the assay. 
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Buffers: 



Buffer A : 
25mM K2HP04 
.5mM CDTA 
.1% Triton X-100 

Buffer B : 

X CCLR (Promega el 53a) 
1.25mg/ml lysozyme 
0.04% gelatin 

Buffer C : 
lOmM HEPES 
150mM NaCl 
Img/mlBSA 
5% glycerol 
0.1 mM EDTA 

IX Assay reagent: 

5uM Luciferin 

175uM ATP 

20mM Tricine , pH 8.0 

0.1 mM EDTA 

0.02X Assay reagent: 

1:50 dilution of IX Assay reagent 

0.06X Assay reagent: 

1:150 dilution of IX Assay reagent 
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Pipetting Procedures 
Pipetting Cells Into the Lysis Plate 
Non-aseptic procedure using fixed tips 

On the pipetter deck : 

5 -place a plate containing approximately 200 ^1 cells without lid 

-Lysate Plate containing 20 |il of Buffer A 

Procedure: 

1 . Move the tips to the washing station and w£ish with 1 ml. 

2. Move to the cell plate and withdraw 60 |iL 
10 3. Move to the Lysate Plate and dispense 45 

4. Repeat steps 1-3 for all 96 samples. 

5. At the conclusion of the procedure, step 1 is repeated to clean the tips. 
Post-procedure: 

- Place Lysate Plate onto the shaker. 

15 - Place lid on plate with cells and place on carousel. 

- Place Lysate Plate into the COj freezer. 



DILUTION FROM LYSIS PLATE TO INCUBATION PLATE 

20 On the pipetter deck : 

- Lysate Plate containing 240 |xl of lysate 

- Incubation Plate without lid containing 285 \il of Buffer C 
Procedure: 

1 . Move the tips to the washing station and wash with 0.5 ml. 
25 2. Move to the Lysate Plate and withdraw 30 ^1. 

3. Move to the Incubation Plate and dispense 15 ^il by direct contact with 
the buffer solution. 

4. Repeat steps 1-3 for all 96 samples. 

5. At the conclusion of the procedure, step 1 is repeated to clean the tips. 
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Post-procedure: 

- Place Incubation Plate on shaker. 

- Discard Lysate Plate. 

5 CREATE PAIR OF DAUGHTER PLATES FROM INCUBATION PLATE 

This procedure is done twice 
On the pjpetter deck: 

- Incubation Plate containing 100-300 \xl of solution without lid 

- Two empty Assay Plates (white) 

10 Procedure: 

1 . Move the tips to the washing station and wash with 0.5 ml. 

2. Move to the Incubation Plate and withdraw 50 [xl. 

3. Move to the first Assay Plate and dispense 20 

4. Move to the second Assay Plate and dispense 20 |al. 
15 5. Repeat steps 1-4 for all 96 samples. 

6. At the conclusion of the procedure, step 1 is repeated to cleem the tips. 

Post-procedure: 

1 . Replace lid on Incubation Plate. 

2. Place Incubation Plate in incubator. 
20 3. Place first Assay Plate in luminometer. 

4. Place second Assay Plate on carousel. 



CREATE SINGLE DAUGHTER PLATE FROM INCUBATION PLATE 
25 On the pipetter deck : 

Place incubation Plate containing 100-300 \x\ of solution without lid 

and 

Empty Assay Plate (white) 
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Procedure: 

1 . Move the tips to the washing station and wash with 0.5 ml. 

2. Move to the Incubation Plate and withdraw 40 

3. Move to the Assay Pl^te and dispense 20 [il. 

4. Repeat steps 1-3 for all 96 samples. 

5. At the conclusion of the procedure, step 1 is repeated to clean the tips. 
Post-procedure: 

- Discard Incubation Plate and lid on Incubation Plate. 

- Place Assay Plate in luminometer. 

DILUTION OF CELLS INTO NEW CELL PLATE 
Aseptic procedure using fixed tips 
On the pipetter deck : 

- plate containing approximately 200 ^il of cells without lid 

- new cell plate containing 180 |j.l of Growth Medium without lid 

Procedure: 

1 . Move to the cell plate and withdraw 45 p.1. 

2. Move to the Cell Plate and dispense 20 \x\ volume by direct liquid-to- 
liquid transfer. 

3. Move to waste reservoir an expel excess cells. 

4. Move to isopropanol wash station aspirate isopropanol to sterilize tips. 

5. Move to wash station, expel isopropanol and wash tips. 

6. Repeat steps 1-4 for all 96 samples. 

Post-procedure: 

1 . Replace lid on original plate of cells and place onto carousel. 

2. Replace lid on new cell plate and place into refrigerator. 

Notes : 

This procedure is used to prepare the cell plates used in the main analysis 
procedure. 
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10 



40 



1 80 |il of Growth Medium is added by the reagent dispenser to each of the 
new cell plates just prior to initiating the pipetting procedure. 

The dispenser is flushed with 75% isopropanol before priming with 
medium. 

The medium also contains selective antibiotics to reduce potential 
contamination, 
Luminometer Procedures 

Ix ASSAY METHOD 

- place plate into luminometer 



1 . Inject 1 00 ul of 1 X Assay reagent 

2. Measure luminescence for 1 to 3 seconds 
15 3 . Repeat for next wel 1 

- continue until all wells are measured 



20 0.02X ASSAY METHOD 

- place plate into luminometer 

1 . Inject 100 ul of 0.02X Assay reagent 

25 2. Measure luminescence for 1 to 3 seconds 

3 . Repeat for next wel 1 

- continue until all wells are measured 

30 

0.06x ASSAY METHOD 

- place plate into luminometer 

35 1 . Inject 1 00 ul of 0.06X Assay reagent 

2. Measure luminescence for 1 to 3 seconds 

3 . Repeat for next well 



- continue until all wells are measured 



REPEAT ASSAY 
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- place plate into luminometer 

1. Measure luminescence for 1 to 3 seconds 
5 2. Repeat for next well 

- continue until all wells are measured 



IN VIVO SELECTION METHOD 
10 5-7 nitrocellulose disks, 200-500 colonies per disk (1000-3500 colonies 

total), are screened per 2 microplates (176 clones). The clones are screened at 

high temperatures using standard screening conditions. 

8 positions in each microplate are reserved from a reference clone using the 

"best" luciferase (the parent clone for random mutagenesis and codon 
15 mutagenesis). The positions of the reserved wells is shown as "X" below. 



XooooooooooX 
oooooooooooo 

20 oooXooooXooo 

oooooooooooo 
oooooooooooo 
oooXooooXooo 
oooooooooooo 

25 XooooooooooX 

The reference clones are made by placing colonies from DNA transformed 
from the parent clone into the reference wells. (To identify these wells prior to 
inoculation of the microplate, the wells are marked with a black marking pen on 
30 the bottom of each well). 
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Wood and 

SCREENING SELECTION CRITERIA 

The following were used to screen. Criteria 1 is achieved manually; data 
for criteria 2-6 is generated by robotic analysis. For all criteria, the maximum 
value as described are selected. 
5 1 . /« vivo screen. The brightest clones are selected at an elevated 

temperature. 

2. Expression/specific activity. The value of normalized luminescence 
are calculated as the ratio of luminescence to optical density. The 
values are reported as the ratio with the reference value. 

10 3. Enzyme stability. Measurements of normalized luminescence of the 

incubated samples (3 taken over about 1 5 hours) are fitted to 
ln(L)=ln(LO)-(t/r ), where L is normalized luminescence and t is time, 
r is a measure of the enzyme stability. The value is reported as the 
ratio with the reference value, and the correlation coefficients are 

15 calculated. 

4. Substrate binding. Measurements of normalized luminescence with 
Ix and 0.02x are taken at the initial reading set, and Ix and 0.06x are 
taken at the 5 hour set. The ratio of the 0.02x: 1 x and 0.06x: 1 x gives 
the relative luminescence at 0.02x and 0.06x concentrations. These 

20 values, along with the relative luminescence at Ix (i.e., 1), are fitted 

to a Lineweaver-Burk plot to yield the Km:app,total for the substrates 
ATP, luciferin, and CoA. The value are reported as the inverse ratio 
with the reference value, and the correlation coefficients are 
calculated. 

25 5. Signal stability. The luminescence of the initial Ix luminescent 

reaction are re-measured 3 additional times over about 1 5 hours. 
These values are fitted to ln(L)=ln(LO)-(t/T ) and the integral over t 
(15 hours) are calculated. Signal stability is then calculated as S=(l- 
int(L)/L0t)2. The value are reported as the inverse ratio with the 

30 reference value, and the correlation coefficient are calculated. 
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6. Composite fitness. The values of criteria 2 through 5 are combined 
into a single composite value of fitness (or commercial utility). This 
value is based on a judgment of the relative importance of the other 
criteria. This judgment.is given below: 

5 

Criteria Relative Value 

Stability 5 
Signal Stability 2 
Substrate Binding 2 
10 Expression/Activity 1 

The composite, C=Sum(criteria 2-5 weighted by relative value, e.g., more 
weight is on stability because that was a major goal). 

EXAMPLE 2: Software 

Procedure: Organize data into SQL database. Each file created by a 
15 luminometer (96 well) (Anthos, Austria) represents the data from one microplate. 

These files are stored in the computer controlling the luminometer, and connected 
to the database computer by a network link. From each microplate of samples, 
nine microplates are read by the luminometer (the original microplate for optical 
density and eight daughter microplates for luminescence). 
20 Ninety files are created in total; each containing data sets for 96 samples. Each 

data set contains the sample number, time of each me£isurement relative to the first 
measurement of the plate, luminometer reading, and background corrected 
luminometer reading. Other file header information is also given. The time that 
each microplate is read is also be needed for analysis. This can be obtained from 
25 the robot log or the file creation time. A naming convention for the files are used 

by the robot during file creation that can be recognized by SQL (e.g. 
YYMMDDPR.DAT where YY is the year, MM is the month, DD is the day, P is 
the initial plate [0-9], and R is the reading [0-8]), 
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Procedure: Data Reduction And Organization. 

- Normalize luminescence data: For each measurement of luminescence in the 
eight daughter plates, the normalized luminescence is calculated by dividing by 
the optical density of the original plate. If any value of normalized luminescence 
is less than zero, assign the value of 0. 1 sL where sL is the standard deviation for 
measurements of normalized luminescence. 

- Calculate relative measurement time: For each normalized luminescence 
measurement, the time of the measurement is calculated relative to the first 
measurement of the sample. For example, the time of all luminescence 
meaisurements of sample B6 in plate 7 (i.e., 7:B06) are calculated relative to the 
first reading of 7:B06. This time calculation will involve both the time when the 
plate is read and the relative time of when the sample is read in the plate. 

- Calculate enzyme stability (r ): For each sample, use linear regression to fit 
ln(Lix)=ln(Lo)-(t/T) using the three luminescence measurements with Ix substrate 
concentrations (Plates 1, 5, 8). Also calculate the regression coefficient. 

- Calculate substrate binding (Km:app.(otai): Using microplates from the first set of 
readings (Plates 1 and 2), calculate the Lo.2x,rei by dividing measurements made 
with substrate concentrations of 0.02x by those of Ix. Similarly, calculate the 
Lo.o6x.re! using microplatcs of the second set of readings (Plates 5 and 6), by 
dividing measurements made with substrate concentrations of 0.06x by those of 
Ix. 



For each sample, use linear regression to fit 1 /L=(Km:app,iotai/L„,ax:app) 

(1/[S])+(1/L„,ax:app) Using 



Km:app,toiai is Calculated as the slope/intercept. Also calculate the regression 
coefficient. 



[S] 




0.02 



0,06 
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- Calculate signal stability (S): For each sample, use linear regression to fit 
ln(L)=ln(Lo)-(t/T ) using the four luminescence measurements of the initial 
microplate with Ix substrate concentrations (Plates 1, 3, 4, and 7). Also calculate 
the regression coefficient. From the calculated values of r and Lq, calculate the 
integral of luminescence by int(L)=z- Lo (l-exp(-tyT )), where t^-is the average time 
of the last measurement (e.g., 1 5 hours). The signal stability is calculated as S=(l- 
int(L)/L/t/)^, where L, is the initial measurement of normalized luminescence with 
Ix substrate concentration (Plate 1) 

[Note: To correct for evaporation, an equation S=(l+K-int(L)/L/t/)^, may be used 
where 1/K=2(relative change of liquid volume at t/).] 

- Calculate the reference value surfaces: A three dimensional coordinate system 
can be defined by the using the grid positions of the samples within a microplate 
as the horizontal coordinates, and the calculated values for the samples (L/, , 
Kni:app,totai r , or S) as the vertical coordinate. This three dimensional system is 

15 referred to as a "plate map". A smooth surface in the plate maps representing a 

reference level can be determined by least squares fit of the values determined for 
the 8 reference clones in each microplate. For each of the 10 initial microplates of 
samples, respective reference surfaces are determined for the criteria parameters 
Ly, r , Km:app,totai, and S (40 surfaces total). 

20 In the least squares fit, the vertical coordinate (i.e., the criteria p£irameters) are the 

dependent variable, the horizontal coordinates are the independent variables. A 
first order surface (i.e., z=ax+by+c) are fitted to the values of the reference clones. 
After the surface is calculated, the residuals to each reference clone are calculated. 
If any of these residuals is outside of a given cutoff range, the reference surface 

25 are recalculated with omission of the aberrant reference clone. 

If a first order surface does not sufficiently represent the values of the reference 
clones, a restricted second order surface are used (i.e., z=a (x^+ky^)+bx+cy+d, 
where k is a constant). 

28 
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- Calculate the reference-normalized values: For the criteria parameter of each 
sample, a reference-normalized values is determined by calculating the ratio or 
inverse ratio with the respective reference value. The reference-normalized values 
are L/ZL/^, r /r r, ^m/^m:Qpp,totau ^nd SyS, where reference values are calculated 

5 from the equations of the appropriate reference surface. 

- Calculate the composite scores: For each sample, calculate 

C=5(r It r)+2(S,/S)+2(K„,/K„,:app.total)+(Li/Li,). 

- Determine subgroupings: For the criteria parameters Lj, r , Km:app:iotai, S, and C, 
delimiting values (i.e., bin sizes) for subgroupings are defined as gL, gr , gKm, 

10 gS, and gC. Starting with the highest values for Lj, r , or C, or the lowest values of 

Kni:app.totai or S, the samplcs are assigned to bins for each criteria parameter (the 
first bin being #1, and so on). 

- Display sorted table of reference-normalized values: Present a table of data 
for each sample showing in each row the following data: 

15 - sample identification number (e.g., 7:B06) 

- composite score (C) 

- reference-normalized enzyme stability (r/rr) 

- correlation coefficient for enzyme stability 

- bin number for enzyme stability 

20 - reference-normalized signal stability (Sr/S) 

- correlation coefficient for signal stability 

- bin number for signal stability 

- reference-normalized substrate binding (Kmi/Kn,:app.iouii) 

- correlation coefficient for substrate binding 
25 - bin number for substrate binding 

- reference-normalized expression/specific activity (Lj/Lif) 

- bin number for expression/specific activity 

The table is sorted by the composite score (C). 
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Procedure: Present sorted table of criteria parameters. 

Present a table of data for each sample showing in each row the following data: 

- sample identification number 

- composite score (C) 
5 - enzyme stability (r) 

- correlation coefficient for enzyme stability 

- bin number for enzyme stability 

- signal stability (S) 

- correlation coefficient for signal stability 
10 - bin number for signal stability 

- substrate binding (Km:app.totai) 

- correlation coefficient for substrate binding 

- bin number for substrate binding 

- expression/specific activity (L\) 

15 - bin number for expression/specific activity 

The table is sorted by the composite score (C); the reference clones are 
excluded from the table. Same entry coding by standard deviation as described 
above. 

Procedure: Present sorted table of reference-normalized values. 

20 This is the same procedure as the final step of the data reduction procedure. The 

table will show: 

- sample identification number 

- composite score (C) 

- reference-normalized enzyme stability (r/r r) 
25 - correlation coefficient for enzyme stability 

- bin number for enzyme stability 

- reference-normalized signal stability (S^/S) 

- correlation coefficient for signal stability 

- bin number for signal stability 
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- reference-normalized substrate binding (Kn^m:iipp.totoi) 

- correlation coefficient for substrate binding 

- bin number for substrate binding 

- reference-normalized e?^pression/specific activity (L/Lir) 
5 - bin number for expression/specific activity 

The table is sorted by the composite score (C); the reference clones are 
excluded from the table. Same entry coding by standard deviation as described 
above. 

Procedure: Present sorted table of criteria parameters for reference clones. 
10 This is the same procedure as described above for criteria parameters, except for 

only the reference clones. The table will show: 

- sample identification number 

- composite score (C) 

- enzyme stability (r ) 

15 - correlation coefficient for enzyme stability 

- bin number for enzyme stability 

- signal stability (S) 

- correlation coefficient for signal stability 

- bin number for signal stability 
20 - substrate binding (K„,:app.totai) 

- correlation coefficient for substrate binding 

- bin number for substrate binding 

- expression/specific activity (L;) 

- bin number for expression/specific activity 

25 The table is sorted by the composite score (C). Same entry coding by 

standard deviation as described above. 

Procedure: Present sorted table of reference-normalized values. 

This is the same procedure as described above for reference-normalized values, 
except for only the reference clones. The table will show: 
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- sample identification number 

- composite score (C) 

- reference-normalized enzyme stability (r /r r) , 

- correlation coefficient for enzyme stability 
5 - bin number for enzyme stability 

- reference-normalized signal stability (S/S) 

- correlation coefficient for signal stability 

- bin number for signal stability 

- reference-normalized substrate binding (Knu/Kmiapp^toui) 
10 - correlation coefficient for substrate binding 

- bin number for substrate binding 

- reference-normalized expression/specific activity (L\/Ly) 

- bin number for expression/specific activity 

The table is sorted by the composite score (C). Same entry coding by 
15 standard deviation as described above. 

Procedure: Sort table. 

Any table may be sorted by any entries as primary and secondary key. 
Procedure: Display histogram of table. 

For any table, a histogram of criteria parameter vs. bin number may be displayed 
20 for any criteria parameter. 

Procedure: Display plate map. 

For any plate, a plate map may be displayed showing a choice of: 

- any luminescence or optical density measurement 
-Li 

25 - Lj reference surface 

- T 

• t reference surface 
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- correlation coefficient of r 
-S 

- S reference surface 

5 -sys 

- correlation coefficient of S 

" ^in:app,total 

- Km reference surface 

10 - correlation coefficient for Km:app,totAi 

- composite score (C) 

The plate maps are displayed as a three dimensional bar chart. Preferably, the 
bars representing the reference clones are indicated by color or some other means. 

Procedure: Display drill-down summary of each entry. 
15 For Li, T , Km:app,totai, and S, any entry value in a table may be selected to display 

the luminescence and optical density reading underlying the value calculation, and 
a graphical representation of the curve fit where appropriate. Preferably the 
equations involved and the final result and correlation coefficient will also be 
display. 

20 - Li or Li/Lp Display the optical density and luminescence value from the 

chosen sample in Plate 0 and Plate 1 . 

- r or r /r r. Display the optical density and luminescence value from the 

chosen sample in Plate 0, Plate 1, Plate 5, and Plate 8. Display graph of In(Lix) 

vs. t, showing data points and best line. 
25 - S or Sf/S. Display the optical density and luminescence value from the 

chosen sample in Plate 0, Plate 1, Plate 3, Plate 4, and Plate 7. Display graph of 

ln(L) vs. t, showing data points and best line. 

~ ^m:app,tolaI Or K.nv/Kni:app,tolal- 

Display the optical density and luminescence 
value from the chosen sample in Plate 0, Plate 1, Plate 2, Plate 5, and Plate 6. 
30 Display graph of 1/L vs. 1/[S], showing data points and best line. 
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EXAMPLE 3: Preparation Of Novel Luciferases 

The gene with FIG. 1 contains a single base pair mutation at position 249, 
T to M. This clone has a spectral maximum of 552 nm which is yellow shifted 
from the sequence of Luc, This mutant was selected as an original template 
5 because it is about 5 time brighter in vivo which allowed for more efficient 

screening. 

C-terminus mutagenesis 

To eliminate the peroxisome targeting signal (-SKL) the L was mutated to 
a STOP and the 3 codons immediately upstream were randomized according to the 
10 oligonucleotide mutagenesis procedure described herein. The mutagenic 

oligonucleotide designed to accomplish this also introduces a unique Spel site to 
allow mutant identification without sequencing. The mutants were screened in 
vivo and 13 colonies picked, 12 of which contained the Spel site. 

N-terminus mutagenesis 
15 To test if expression could be improved, the 3 codons immediately 

downstream from the initiation Met were randomized as described herein. The 
mutagenic oligo designed to accomplish this also introduces a unique Apal site to 
allow mutant identification without sequencing. Seven clones were selected, and 
six of the isolated plasm ids were confirmed to be mutants. 

20 Shuffling of C- and N-terminus mutants 

The C- and N-terminus mutagenesis was performed side-by-side. To 
combine theN and C-terminus mutations, selected clones from each mutagenesis 
experiment were combined with the use of recombination mutagenesis according 
to the recombination mutagenesis protocol described herein. The shuffled mutants 

25 were subcloned into amp^ pRAM backbone and screened in DH5 PIQ. [BRL, 

Hanahan, 1985) A total of 24 clones were picked, only 4 contained both the N- 
and C- terminus mutations. These 4 clones were used as templates for 
randomization of the cysteine positions in the gene. 
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Mutagenesis to randomize cysteine positions/Random mutagenesis and 
recombination mutagenesis in the Luc gene 

There are 7 cysteine positions in the Ppe-2 gene. It is known that these 

positions are susceptible to oxidation which could cause destabilization of the 

5 protein. Seven oligonucleotides were ordered to randomize the cysteine positions. 

The oligonucleotides were organized into two groups based upon the 

conservation of cysteine in other luciferase genes from different families. Group 1 

randomizes the conserved cysteine positions C-60, C-80, and C-162. Group 2 

randomizes cysteines that are not strictly conserved at positions C-38, C-127, C- 

10 221,andC-257. 

The four selected templates from the N and C terminus mutagenesis were 
sub-cloned into an ampicillin-sensitive backbone and single-stranded DNA was 
prepared for each of the templates. These templates were combined in equal 
amounts and oligonucleotide mutagenesis was completed as described herein. It 

15 was determined by plating an aliquot of the mutS transformation prior to overnight 

incubation that each of the 2 groups contained 2x10" independent transfonmants. 
MutS-DNA was prepared for the 2 groups and was then transformed into JM109 
cells for screening. Mutants from group 1 were screened in vivo and picks were 
made for a full robotic run. Five clones were selected that had improved 

20 characteristics. Mutants from group 2 were screened in vivo and picks were made 

for a full robotic run. The temperature incubator on the robot was set at 33T for 
this set of experiments. Ten clones were selected that had improved 
characteristics. 

The fifteen best picks from both groups of the cysteine mutagenesis 
25 experiments were shuffled together as described herein and 1 8 of the best clones 

were selected after robotic processing. 

The "best" clone from the above experiment (31-1G8) was selected as a 
template for subsequent rounds of mutagenesis. (The high temperature robot 
incubator temperature was set to 42°C) Another complete round of mutagenesis 
30 was completed. 
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The 18 best clones from the above mutagenesis were picked and clone (39- 
5B1 0) was selected as the best clone and was used as a template for another round 
of mutagenesis. (The high temperature robot incubator temperature was set at 
49°C). 

5 After this cycle, 6 of the best clones were selected for sequencing. Based 

upon the sequence data, nine positions were selected for randomization and seven 
oligos were designed to cover these positions. Based upon data generated from 
the robot, it was determined that the best clone from the group of six clones that 
were sequenced was clone (49-7C6). The luciferase gene from this clone was sub- 

10 cloned into an ampicillin-sensitive pRAM backbone and single stranded DNA was 

prepared. The randomization of the selected positions was completed according to 
the oligonucleotide mutagenesis procedure listed above. 

The randomization oligos were divided into 4 groups, and transformants 
from these experiments were picked and two robotic runs were completed. Ten 

15 clones were selected from the two experiments. (The high temperature robot 

incubator temperature on robot was set at 56*^0). 

The best 10 picks from the above two experiments, and the best 18 picks 
from the previous population of clones were shuffled together (recombination 
mutagenesis protocol). 

20 The 18 best clones were selected and clone 58-OA5 was determined to be 

the best clone. This clone was then used as a template for another round of 
mutagenesis. The high temperature robot incubator temperature was set at 56°C. 
Clone 71-504 was selected as a new lead clone and another round of mutagenesis 
was completed. Incubator set at 60°C. 

25 The best 1 8 picks were selected and the best clone from this group was 

determined to be clone 78-OBlO. The temperature stability of clones at various 
temperatures is presented in the FIGS. 

EXAMPLE 4: Mutagenesis Strategy From Clone 78-OBlO to 90-1B5 
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1. 23 oligos (oligonucleotides) were ordered to change 28 positions to consensus. 
All of the oligos were tested individually using oligo directed mutagenesis 
with single stranded DNA from clone luc78'0B10 as a template to determine 
which oligos gave an improvem.ent in stability. Below is a table which lists the 
5 mutagenic oligos. 



Description 




A17toT 


6215 


M25 to L 


6216 


S36 to P; remove Nsi 1 
site 


6217 


A101 toV. S105toN 


6218 


1125 to V 


6219 


K139toQ 


6220 


V145 to 1 


6221 


V194to 1 


6222 


V203 to L. S204 to P 


6231 


A216toV 


6232 


A229 to Q 


6233 


M249 to T (reversion) 


6234 


T266 to R. K270 to E 


6235 


E301 to D 


6236 


N333 to P. F334 to G 


6237 


R356 to K 


6238 


1363 to V 


6246 


A393 to P 


6247 


R417toH 


6248 


G482 to V 


6249 


N492 to T 


6250 


F499 to Y. S501 to A 


6251 


L517 to V 


6252 


F537 to L 


6253 



*Note that oligo #6234 does not change a consensus position. This oligo causes a 
reversion of position 249 to the wild-type PPE-2 codon. Although reversion of 
10 this position was shown to increase thermostability, reversion of this position 

decreased light output. 
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Oligonucleotide-directed mutagenesis with clone luc78'0B10 as a template: 

Based on the results of individually testing the mutagenic oligonucleotides 

listed above, three experiments were completed and oligos for these 

experiments were divided in the following manner: 

a. 6215,6234,6236,6248 (found to give increased stability) 

b. 

215,6217,6218,6219,6220,6221,6222,6231,6233,6234,6236,6238,6247,6248.6 
249,6251,6253. 

(found to be neutral or have increased stability.) 
c. All 23 oligos. 

Selections from the three experiments listed above were screened with the 
robotic screening procedure (Experiment 84). {Iuc78-0B10 used as a control). 
Selections from experiment 84 were recombined using the recombination 
mutagenesis procedure and then screened with the robotic screening procedure 
(Experiment 85). 

Single stranded DNA was prepared from three (3) clones, Iuc85-3E12, lucSS- 
4F12y Iuc85-5A4. These clones were used as templates for oligonucleotide- 
directed mutagenesis to improve codon usage. Positions were selected based 
upon a codon usage table published in Nucleic Acids Research vol. 18 
(supplement) 1990. page. 2402, The table below lists oligos that were used to 
improve codon usage in £*. colL 



Description 


Oligo synthesis # 


L7-(tta-ctg), remove Apa 1 


6258 


site 




L29-(tta-ctg) 


6259 


T42-(aca-acc) 


6260 


L51 ,L56-(tta-ctg).L58-(ttg- 


6261 


ctg) 




L71-(tta-ctg) 


6262 


L85-(ttg-ctg) 


6263 


L95-(ttg-ctg).L97(ctt-ctg) 


6273 
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L113,L117-(tta-ctg) 


6274 


L151,L153-(tta-ctg) 


6276 


L163-(ctc-ctg) 


6276 


R187-(cga-cgt) 


6277 


L237-(tta-ctg) 


6279 


R260-(cga-cgc) 


6280 


L285,L290-(tta-ctg),L286- 


6281 


(ctt-ctg) 




L308-(tta-ctg) 


6282 


L318-(tta-ctg) 


6283 


L341 -(tta-ctg).T342-(aca- 


6284 


acc) 




L380-(ttg-ctg) 


6285 


L439-(tta-ctg) 


6286 


L458-(ctc-ctg),L457-(tta-ctg) 


6293 


T506-(aca-acc),L51 0-(cta- 


6305 


ctg) 




R530-(aga-cgt) 


6306 



6. In the first experiment, the three templates listed above from Experiment 85 
were combined and used as a templates for oHgonucleotide-directed 
mutagenesis. All of the oligos were combined in one experiment and clones 
5 resulting from oligonucleotide-directed mutagenesis were screened using the 

robotic screening procedure as Experiment 88. There were a low percentage of 
luminescent colonies that resulted from this experiment, so another 
oligonucleotide-directed mutagenesis experiment was completed in which the 
oligonucleotides were combined in the following groups: 

10 

a. 6258,6273,6280,6286 

b. 6259,6274,6281,6293 

c. 6260,6275,6282,6294 

d. 6261,6276,6283,6305 
15 e. 6262,6277,6284,9306 

f 6263,6279,6285 
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7. It was discovered that samples from group b had a low amount of luminescent 
colonies, and it was hypothesized that one of the oligos in group b was 
causing problems. Selections were made from all of the experiments with the 
exception of experiment b. Samples were then run through the robotic 

5 screening procedure (Experiment 89). 

8, Selections from Experiments 88 and 89 were shuffled together with the 
recombination mutagenesis protocol and were then screened with the robotic 
screening procedure (Experiment 90). 

10 MATERIALS AND METHODS 

A. Mutagenesis Protocol 

The mutant luciferases disclosed herein were produced via random 
mutagenesis with subsequent in vivo screening of the mutated genes for a plurality 
of characteristics including light output and thermostability of the encoded 
15 luciferase gene product. The mutagenesis was achieved by generally following a 

three-step method: 

1. Creating genetic diversity through random mutagenesis. Here, error- 
prone PCR of a starting sequence such as that of Luc was used to create 
point mutations in the nucleotide sequence. Because error-prone PCR 

20 yields almost exclusively single point mutations in a DNA sequence, a 

theoretical maximum of 7 amino acid changes are possible per nucleotide 
mutation. In practice, however, approximately 6.1 amino acid changes per 
nucleotide is achievable. For the 550 amino acids in luciferase, 
approximately 3300 mutants are possible through point mutagenesis. 

25 2. Consolidating single point mutations through recombination 

mutagenesis. The genetic diversity created by the initial mutagenesis is 
recombined into a smaller number of clones by sPCR This process not 
only reduces the number of mutant clones, but because the rate of 
mutagenesis is high, the probability of linkage to negative mutations is 

30 significant. Recombination mutagenesis unlinks positive mutations from 
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negative mutations. The mutations are "re-linked" into new genes by 
recombination mutagenesis to yield the new permutations. Then, after re- 
screening the recombination mutants, the genetic permutations that have 
the "negative mutations" are.eliminated by not being selected. 
5 Recombination mutagenesis also serves as a secondary screen of the initial 

mutants prepared by error-prone PGR. 
3. Broadening genetic diversity through random mutagenesis of selected 
codons. Because random point mutagenesis can only achieve a limited 
number of amino acid substitutions, complete randomization of selected 

10 codons is achieved by oligonucleotides mutagenesis. The codons to be 

mutated are selected from the results of the preceding mutagenesis 
processes on the assumption that for any given beneficial substitution, other 
alternative amino acid substitutions at the same positions may produce 
even greater benefits. The positions to be mutated are identified by DNA 

15 sequencing of selected clones. 

B. Initial mutagenesis experiments 

Both the N-terminus and the C-terminus of the starting sequence were 
modified by oligonucleotide-directed mutagenesis to optimize expression and 
remove the peroxisomal targeting sequence. At the N-terminus, nine bases 

20 downstream of the initiation CODON were randomized at the C-terminus, nine 

bases upstream of the termination CODON were randomized. Mutants were 
analyzed using an in vivo screen, resulting in no significant change in expression. 

Six clones from this screen were pooled, and used to mutate the codons for 
seven cysteines. These codons were randomized using oligonucleotide-directed 

25 mutagenesis, and the mutants were screened using the robotic screening 

procedure. From this screen, fifteen clones were selected for directed evolution. 

C, Generating and Testing Clones 

Several very powerful and widely known protocols are used to generate 
and test the clones of the present invention. Unless noted otherwise, these 
30 laboratory procedures are well known to one of skill in the art. Particularly noted 
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as being well known to the skilled practitioner is the polymerase chain reaction 
(PCR) devised by Mullis and various modifications to the standard PCR protocol 
(error-prone PCR, sPCR, and the like), DNA sequencing by any method (Sanger's 
or Maxxam & Gilbert's methodology), amino acid sequencing by any method 
5 (e.g., the Edman degradation), and electrophoretic separation of polynucleotides 

and polypeptides/proteins. 

D. Vector Design 

A preferred vector (pRAM) used for the mutagenesis procedure contains 
several unique features that allow for the mutagenesis strategy to work efficiently: 
10 The pRAM vector contains a filamentous phage origin, fl, which is 

necessary for the production of single-stranded DNA. 

Two Sfil sites flank the gene. These sites were designed by so that the 
gene to be subcloned can only be inserted in the proper orientation. 
The vector contains a iac promoter. 
15 Templates to be used for oligonucleotide mutagenesis contain a 4 base-pair 

deletion in the bla gene which makes the vector ampicillin-sensitive. The 
oligonucleotide mutagenesis procedure uses a muteint oligonucleotide as well as an 
ampicillin repair oligonucleotide that restores function to the bla gene. This 
allows for the selection of a high percentage of mutants. (If selection is not used, 
20 it is difficuh to obtain a high percentage of mutants.) 



E. Uses of Luciferases 

The mutant luciferases of the present invention are suitable for use in any 
application for which previously known luciferases were used, including the 
25 following: 

ATP Assays . The greater enzyme stability means that reagents designed 
for detection of ATP have a greater shelf-life and operational-life at higher 
temperatures (e.g., room temperature). Therefore, a method of detecting ATP 
using luciferases with increased thermostability, is novel and useful. 
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Luminescent labels for nucleic acids, proteins, or other molecules . 
Analogous to advantages of the luciferases of the present invention for ATP 
assays, their greater shelf-life and operational-life is a benefit to the reliability and 
reproducibility of luminescent labels. This is particularly advantageous for 

5 labeling nucleic acids in hybridization procedures where hybridization 

temperatures can be relatively high (e.g. greater than 40°C. Therefore, a method 
of labeling nucleic acids, proteins, or other molecules using luciferases of the 
present invention is novel and useful. 

Genetic reporter In the widespread application of luciferase as a genetic 

10 reporter, where detection of the reporter is used to infer the presence of another 

gene or process of interest, the increased thermal stability of the luciferases 
provides less temperature dependence of its expression in living cells and in cell- 
free translations and transcription/translation systems. Therefore a method using 
the luciferases of the present invention, as genetic reporters is novel and useful. 

15 Enzyme immobilization . Enzymes in close proximity to physical surfaces 

can be denatured by their interaction with that surface. The high density 
immobilization of luciferases onto a surface to provide strong localized 
luminescence is improved by using high stability luciferases. Therefore, a method 
of immobilizing luciferases onto a solid surface using luciferases of the present 

20 invention, is novel and useful. 

Hybrid proteins . Hybrid proteins made by genetic fusion genes encoding 
luciferases and of other genes, or through a chemical coupling process, benefit by 
having a greater shelf-life and operational-life. Therefore, a method of producing 
hybrid proteins through genetic means or chemical coupling using the luciferases 

25 of the present invention, is novel and useful. 

High temperature reactions . The light intensity of a luciferase reaction 
increases with temperature until the luciferase begins to denature. Because the use 
of thermostable luciferases allows for use at greater reaction temperatures, the 
luciferases of the present invention are novel and useful for performing high 

30 temperature reactions. 
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Luminescent solutions . Luminescence has many general uses, including 
educational, demonstrational, and entertainment purposes. These applications 
benefit from having enzymes with greater shelf-life and operational-life. 
Therefore, a method of making luminescent solutions using the luciferases of the 
5 present invention, is novel and useful. 

F. Firefly luciferase 

The firefly luciferase gene chosen for directed evolution was Luc isolated 
from Photuris pennsylvanica. The luciferase was cloned from fireflies collected in 
Maryland by Wood ei al and later was independently cloned by Dr. Leach using 

10 fireflies collected in Oklahoma (Ye et al) (1977). A mutant of this luciferase 

(T249M) was made by Wood ei al and used in the present invention because it 
produced approximately 5-fold more light when expressed in colonies of E. coli, 
, Overview of Evolution Process: Directed evolution was achieved through 
a recursive process, each step consisting of multiple cycles of 1) creating 

15 mutational libraries of firefly luciferase followed by 2) screening the libraries to 

identify new mutant clones having a plurality of desired enzymological 
characteristics. 

To begin the process, three mutational libraries were created using error- 
prone PGR (Fromant a/., 1995). Each library was screened first by visual 

20 evaluation of luminescence in colonies of E. coli (Wood and De Luca, 1 987), and 

then by quantitative measurements of enzymological properties in E. coli cell 
lysates. Approximately 10,000 colonies were examined in the visual screen, from 
which 704 were selected for quantitative analysis. From each quantitative screen 
1 8 clones were selected. 

25 The three sets of 1 8 clones each were pooled together, and a new 

mutational library was created using DNA shuffling to generate intragenetic 
recombinations (sPCR; Stemmer, 1994). The results were screened to yield 
another set of 1 8 clones. The entire process was completed by combining this set 
of 18 clones with 18 clones from the previous round of evolution, creating another 

30 mutational library by DNA shuffling, and screening as before. 
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Screening method: In the qualitative visual screen, colonies were selected 
only for their ability to sustain relatively bright luminescence. The thermal 
stability of the luciferase within the colonies of E, coli was progressively 
challenged in successive rounds of Qvolution by increasing the temperature of the 
5 screen. The selected colonies were inoculated into wells of 96-well plates each 

containing 200iil of growth medium. 

In the quantitative screens, lysates of the E. coli cultures were measured for 
1) luminescence activity, 2) enzyme stability, 3) sustained enzymatic turnover, and 
4) substrate binding. 

10 "Luminescence activity" was measured as the ratio of luminescence 

intensity to the optical density of the cell culture. 

"Enzyme stability" was determined by the rate of activity loss from cell 
lysates over 10 hours. In successive rounds of evolution the incubation 
temperature of the lysates was increased. 

15 "Sustained enzymatic turnover" was determined by the rate of 

luminescence loss of a signal enzymatic reaction over 10 hours at room 
temperature. "Substrate binding" was determined by the relative activity of the 
lysate when assayed with diluted substrate mixtures. Of these four parameters, the 
highest priority for selection was placed on thermostability. 

20 Robotic Automation. Robotic automation was used in the quantitative 

screens to accurately perform the large number of required quantitative assays on 
the cultured cells. Overnight cultures were first diluted into fresh medium and 
grown for 3 hours to produce cultures in mid-log phase growth. The optical 
densities of each cultures was then measured, and aliquots of the cultures were 

25 lysed by freeze/thaw and lysozyme. The resulting lysates were further diluted 

before analysis and incubated at elevated temperatures. Luminescence was 
measured from aliquots of the diluted lysates, taken at various times, and 
measured under various conditions as prescribed by the analytical method (see 
Example 2). Computer analysis of this data yielded the quantitative selection 

30 criteria described above. 



45 



wo 99/14336 



PCT/US98/19494 



Summan/ of evolutionary progression: After mutagenesis of the N- and 
C-termini, and randomization of the cysteine codons, a pool of 15 clones was 
subjected to two rounds of directed evolution as described herein. Five of the 18 
clones resulting from this process w.ere sequenced to identify mutations. One of 
5 these clones designated, 49-7C6, was chosen for more detailed analysis and 

further mutagenesis. This clone contained 10 new amino acid substitutions 
compsured to the luciferase Luc[T249M]. 

To assess the potential for other amino acid replacements at the sites of 
these substitutions, oligonucleotide-directed mutagenesis was used to randomize 

10 these codons. The resulting clones were screened as described herein, and 1 8 

selected clones were used to initiate two new rounds of directed evolution. Of the 
18 clones resulting from this second set of rounds, the clone designated 78-OBlO 
was chosen for additional study and mutagenesis. This clone encoded a luciferase 
that contained 16 new amino acid substitutions compared to Luc[T249M]. 

15 Using oligonucleotide directed mutagenesis with 78-OBlO as the template, 

codons were selected for substitution to consensus amino acids previously known 
among beetle luciferases. Selections from this mutagenesis experiment were 
shuffled together and three clones, determined to be the most stable were then 
used as templates for oligonucleotide mutagenesis to improve codon usage in 

20 E, coli, A clone designated 90-1B5 selected from this experiment, contained 28 

amino acid substitutions relative to Luc[T249M]. Out of 25 codons selected for 
change to consensus amino acids, 1 1 were replaced in the clone designated 90- 
1B5. Only five out of the 30 positions that were selected for improved codon 
usage were substituted and had little effect on enzyme expression. 

25 Protein puriflcation The four mutants that are described herein 

(Luc[T249M], 49-7C6, 78-OBlO, and 90-1 B5) were purified using a previously 
published procedure (Hastings etal, 1996). 

Enzymological characterization Purified proteins were diluted in 
25mmol/L HEPES pH 7.8, 150mmol/LNaCl, O.lmmol/L EDTA, Img/mL BSA. 

30 Enzyme stability was determined from diluted proteins incubated at different 

temperatures, and aliquots were removed at different time points. A lincEir 
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regression of the natural log of the luminescence and time was calculated. 
Half-life was calculated as the ln(0.5)/slope of the regression. 

E. PCR Mutagenesis Protocol (Random Mutagenesis): 
PCR mutagenesis reactions 

1 . Prepare plasmid DNA from a vector containing the gene of interest, 
estimate DNA concentration from a gel. 

2. Set up two 50 ^1 reaction reactions per group: 

There are three groups of mutagenic conditions using different skewed 
nucleotide concentrations. 

The conditions listed herein yield in the range of from 8-10% wild-type 
Luc colonies after subcloning phenotypic for each generated parent clone. The rate 
of mutagenesis is estimated by the number of luminescent colonies that are present 
after mutagenesis. Based upon results of clones mutated in the range of 8-10%, it 
was determined that this level of mutagenesis produces on average approximately 
2-3 amino acid changes per gene. If the mutagenesis rate is selected so that on 
average there is one amino acid change per gene, then on average 50% of the 
clones will have no mutations. (Bowie, ei al, 1990/ 

For the master mix: add all components except polymercise, vortex, spin 
briefly, add polymerase, and mix gently. 



Component AtoT/TtoA AtoC/TtoG Gtoa/CtoT 



Datp 


0.3mM 


O.lmM 


0.25mM 


Dctp 


2,75mM 


4mM 


ImM 


DGTP 


0.06mM 


0,02mM 


0.05mM 


DTTP 


0.625mM 


0,3mM 


0.6mM 


♦pRAMtailUP 


0.4 pmol/ul 


0.4 pmol/ul 


0.4 pmol/ul 


*pRAMtailDN 


0.4 pmol/ul 


0.4 pmol/ul 


0.4 pmol/ul 


*Taq. Polymerase 


lU/ul 


lU/ul 


lU/ul 




6.77mM 


5,12mM 


2.7mM 


°MnCl2 


0.5mM 


0.5mM 


0,3mM 


DNA 


50ng total 


50ng total 


50ng total 
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Component 


AtoT/TtoA 




AtoC/TtoG 


Gloa/CloT 


lOx PGR buffer 


IX 


iX 


IX 


Autoclaved nanopure 


ToSOul 


To 50 ul 


To 50 ul 



10 



15 



20 



* Taq. Polymerase is purchased from Perkin Elmer (N808-0101). 

lOx Tag polymerase buffer (aliquot the Taq into 1.5 ml tubes and store at -70°C): 

" lOOmM Tris-HCl pH8.4 from IM stock 

- 500mM KCL 

Primers are diluted from a 1 nmol/|il stock to a 20 pmol/ |il working stock. 
pRAMtailup: 5'-gtactgagacgacgccagcccaagcttaggcctgagtg-3 ' 
pRAMtaildn : 5 *"ggcatgagcgtgaactgactgaactagcggccgccgag-3 ' 
° MnCl2 and MgCl2 are made fresh from IM stocks. The stocks are filter 

sterilized and mixed with sterile water to make the lOmM and 25mM stocks which 

are then stored in Polystyrene Nalgene containers at 4°C. 

Cycle in thermal cycler: 94°C for Imin (94°C-lmin, 72**C-10min) lOx. 
3. Purify reaction products with Wizard PGR purification kit (Promega 
Corporation, Madison, Wisconsin, part#A718c): 

- transfer PGR reaction into a new tube containing Promega 100 |al 
Direct Purification buffer (Part#A724a) 

- add 1 ml of Wizard PCR Purification Resin (part#A718c) Promega 
and incubate at room temperature for Imin 

- pull resin though Wizard minicolumn 

- wash with 80% Ethanol 

spin in microcentrifuge to remove excess Ethanol 

- elute into 50 lal sterile nanopure water (allow water to remain on 
column for at least 1 min) 
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Amplification^ Of Mutagenesis Reaction 

1 . Set up five 50 ml reactions per group: 

- To master mix: add all components except polymerase, vortex, 
spin briefly, add polymerase, mix gently. 

5 ° 1 Ox reaction buffer for Native PFU contains 20mM MgCl2, so no 

additional MgCl2 needs to be added 
+ primers: 

pRAM 1 Sup -5 'gtactgagacgacgccag-3 ' 
pRAM 1 9dn -5 'ggcatgagcgtgaactgac-3 ' 
10 Cycling conditions: 94-30 sec (94-20 sec, 65-1 min, 72-3 min) 25x 

(Perkin-Elmer Gene Amp® PGR System 2400) 

2. Load 1 |il on a gel to check amplification products 

3. Purify amplification reaction products with Wizard PGR purification kit 
(Promega Gorporation, part#A718c): 

15 - transfer PGR reaction into a new tube containing 100 pi Direct 

Purification buffer (Promega, Part#A724a) 

- add 1 ml of Wizard PGR Purification Resin (Promega 
Part#A718c) and incubate at room temperature for 1 min 

- pull resin though Wizard minicolumn 
20 - wash with 80% Ethanol 

- spin in microcentrifuge to remove excess Ethanol 

- elute with 88 ^il sterile nanopure water (allow water to remain on 
column for at least 1 min) 



' This amplification step with PFU Polymerase was incorporated for 2 reasons: 

(a) To increase DNA yields for the production of large numbers of transformants. 

(b) To reduce the amount of template DNA that is carried over from the mutagenic 
PGR reaction: (Primers for the second amplification reaction are nested within the 
mutagenic primers. The mutagenic primers were designed with non-specific tails 
of 1 1 and 12 bases respectively for the upstream and downstream primers. The 
nested primers will amplify DNA that was previously amplified with the 
mutagenic primers, but cannot amplify pRAM template DNA.) 
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Subcloning of amplified PCR mutagenesis products 

1 . Digest the DNA with as follows: 

- 2 ^l Sfi\ (Promega Part #R639a) 

- 10 nl lOX buffer B (Promega Part #R002a) 

5 - 88 |il of DNA from Wizard PCR prep (see step 3 [in 

amplification]) 

- mix components and overlay with 2 drops of mineral oil; incubate 
at SO^^C for 1 hour 

2. Remove salts and Sfi ends with Wizard PCR purification as described 
10 herein, and 

elute into 50 |il sterile nanopure water 

3. Ligation into pRAM (+/r) backbone (set up 4 ligations per group): 

- 0.025 pmol pRAM backbone 

- 0.05 pmol insert (usually in the range of 6 to 12 fil of insert) 
15 - 1 ^il of T4 DNA Ligase (Ml 80a) 

- 2 ^il of lOx ligase buffer (C126b, divide into 25 |il aliquots, do not 
freeze/thaw more than twice) 

- water to 20 |il 

- ligate for 2 hours at room temperature 

20 - heat reactions for 15 min at 70 C to inactivate ligase 

Transformation and plating 

1 . Butanol precipitate samples to remove excess salts (n-Butanol from 
Sigma, St. Louis, Missouri, part#BT-105): 

(if Ethanol precipitation is used instead of butanol awash with 70% 
25 ethanol as needed) (excess salt will cause arcing during the electroporation which 

causes the reaction to fail) 

- add water to 50 ^1 

- add 500 ^1 of n-butanol 

- mix until butanol/ligation mix is clear and then spin for 20 min at 
30 room temperature 
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- drain butanol into waste container in fume hood 

- resuspend in 12 ^1 water, spin 30 sec at full speed 

2. Preparation of cell/DNA mix (set up 4 transformations plus one with 
reference clone DNA): . . 
5 - while DNA is precipitating, place electroporation cuvettes on ice 

- fill 15 ml Falcon snap-cap tubes with 3 ml S.O.C. medium and 

place on ice 

- thaw JM109 electrocompetent cells on ice (50 \il per ligation 

reaction) 

10 - pipette 10 nl of the bottom layer from step 1 (or 0.5 ^il ref clone 

DNA) into competent cells 

(small amounts of butanol carry-over do not adversely effect the 
transformation efficiency) 

- place cell/DNA mix on ice 
15 3 . Electroporation: 

- carry tubes, cuvettes, and cell/DNA mix on ice to electroporation 

device 

- pipette cell-DNA mix into a cuvette and zap. Instrument settings: 
Cuvette gap:0.2 cm 

20 Voltage: 2.5 kV 

Capacitance:25 
Resistance:200 Ohms 
Time constant: 4.5 msec 

- pipette 1 ml SOC (contains KCL; media prep #KCLM) into 

25 cuvette, quickly pour into recovery tube (transformation efficiency is reduced if 

cells are allowed to sit in cuvette) 

- place the recovery tube on ice until all samples are processed 

- allow the cells to recover at 37°C for 30-60 min 

- plate on LB+amp plates with nitrocellulose filters 

30 (# of colonies is --20% higher if cells recover 60 min, possibly due to cell 

replication. See 101305 p. 65) 
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(Best colony density for screening is 500 per plate. For the current batch of 
cells plate --500 to TSOjil) 

F. Recombination Mutagenesis Protocol or DNA shuffling: 

DNase I digestion of plasmid DNA 
5 1 . Prepare 2% low melting point gel 

- use 0.8g agarose in 40 ml (NuSieve #50082) 

- use large prep comb 

- make sure it is solidified prior to digesting 
2. Prepare 4 |ig of pooled plasmid DNA for digest 

10 3. Prepare 1 U/^1 DNase dilution on ice according to the table below: 



Dnase \^ 


0.74 \i\ 


lOx Dnasel buffer 


10 III 


1% gelatin* 


10 Hi 


Water to 100 ^il 





^ DNase I from Sigma (D5791) 

* Gelatin was added to keep the DNase 1 from sticking to the walls of the tubes. 
15 This dilution can be kept on ice for at least 30 min without loss in activity. 

4. Digest (set up at room temperature): 

prepare two digests with l.OU and 1.5U DNasel per 100 |il reaction: 

- 10 ^1 of lOx DNase I buffer (500mM Tris, lOmM MgC12 pH 7.8) 

- X 1^1 DNA ( 2|ig of pooled plasmid DNA from step 2) 
20 - 1 or 1.5 |il of the lU/|il enzyme dilution 

- sterile nanopure water to 100 |il 

- incubate at room temperature for 10 minutes 

- stop reaction with 1 (il of 1 OOmM CDTA 



25 



Purification from agarose gel 

1 . Run DNase digested fragments on gel 
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- add 10 ^1 of lOx blue juice to each DNase I digest 

- load all on a 2% Low melting point agarose gel 

- run about 30 min at 120-1 50V 

- load pGEM DNA niarker in middle lane 
5 2. Isolate fragments 

- cut out agarose slice containing fragments in the size range of 600- 
lOOObp using a razor blade 

- cut into pieces that weigh --OJg 

- melt the gel slices at 70^C 

10 - add 300 ^1 of Phenol (NaCl/Tris equilibrated) to the melted 

agarose, vortex for -1 min at max speed 

- spin for 1 0 min at 4^C (the interface is less likely to move around 
if it is done at 4X) 

- remove the top layer into a tube containing an equal volume of 
15 Phenol/Chloroform/Isoamyl (saturated with 300mM NaCI /lOOmM Tris pH 8.0), 

vortex and spin for 5 min at RT 

- remove the top layer into a tube containing chloroform and vortex 

and spin. 

- remove the top layer into a tube with 2 vol. of 95% cold Ethanol; 
20 place in -70°C freezer for 10 min (no additional salts are needed because of the 

High Salt Phenol) 

- spin at 4°C for 1 5 minutes. 

- wash with 70% Ethanol, drain and air dry for -10 min 

- resuspend in 25 to 50 ^il of sterile nanopure water 
25 - store at -70°C until ready for use 

Assembly reaction 

Set up 4 reactions and pool when completed 
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Component Concentration Amount in \il Final concentration 



dATP 


10 mM 


1 


200 nM 


dCTP 


10 mM 


1 


200 nM 


dGTP 


10 mM 


1 


200 ^M 


dTTP 


10 mM 


1 


200 hM 


DNA* 




5 




Tli 


3V/\x\ 


0.4 


0.24 U/pil 


lOX Thermo buffer 


lOX 


5 


IX 


MgCh 


25mM 


4 


2mM 


gelatin 


1% 


5 


0.1% 


water 




To 50^1 





* Because the DNA used for this reaction has been fragmented, it is 
difficult to estimate a concentration. The easiest way is to load 5 jil of the DNasel 
digested DNA to an agarose gel and run the gel until the dye enters the wells (1-2 
min). Fragments from a typical 2^g DNA digest which were resuspended in 100 
5 |il of water will give a DNA concentration of -1 to 10 ng/|xl. See 101284 p.30 for 

a photo of this type of gel. 

Cycling conditions: 94-30sec [94-20sec, 65-1 min, 72-2min] 25x (Program 
"assembly-65", runs --2.5 h) 

Amplification of assembly 
10 Usually 5 amplification reactions will produce enough DNA for a full 8 

plate robotic run 



Component Concentration Amount in |xl Final concentration 



Datp 


10 mM 


1 


200 mM 


dCTP 


10 mM 


1 


200 ^iM 


dGTP 


10 mM 


1 


200 hM 


dTTP 


10 mM 


I 


200 nM 


pRAMtailup* 


20 pmol/^il 


2 


0.8 pmol/^d 


pRAMtaildn* 


20 pmol/^l 


2 


0.8pmol/^l 


PFU native polymerase"*^ 


2U/nl 


1 


0.0 4U/^1 


lOx native PFU buffer^ 


Ix 


5 


Ix 
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DNA 




5 




water 




water to 50 |il 





* Note that the concentration of primers is twice as high as in a typical 
amplification reaction, 

° The PFU lOX buffer contains 20mM MgC12, so it is not necessary to add 
5 MgC12. 

-H PFU is ordered from Stratagene part #600135. 
Cycling conditions: 94-30sec [94-20sec, 65-1 min, 72-3min] 25x 

Suhcloning of assembly amplification 

1 . Purify amplification products with Wizard PCR purification: 
10 - pool 5 amplification reactions 

- transfer into a new tube that contains 100 ^1 of Direct Purification 

buffer 

- add 1 ml of Wizard PCR Purification Resin, incubate at RT for 1 

min 

15 - pull Resin though Wizard minicolumn 

- wash with 80% ethanol and spin in microcentrifuge to remove 

excess ethanol 

- elute with 88 ^1 of sterile nanopure water (allow water to remain 
on column for at least 1 min) 

20 2. Digest with Sfil: 

- 2 ^il Sfil 

- 10 ^l lOx buffers 

- 88 ^il of DNA from Wizard PCR prep 

- mix components and overlay with 2 drops of mineral oil; incubate 
25 at SO^'C for 1 hour 

3. Band isolation: 

Sometimes after amplification of the assembly reaction a band that is 
smaller than the gene-sized fragment is produced. This small fragment has been 
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shown to subclone about 10-fold more frequently than the gene sized fragment if 
the sample is not band isolated. When this contaminating band is present, it is 
necessary to band isolate after Sfi I digestion. 

- load the DNA to aQ.7% agarose gel 

5 - band isolate and purify with the Gene Clean kit from Bio 101 

- elute DNA with 50 ^1 sterile nanopure water, check concentration 
on gel (This type of purification with standard ageirose produced the highest 
number of transformants after subcloning. Other methods tried: Low melt with 
Phenol chloroform. Gene clean with low melt. Wizard PGR resin with standard 

10 agarose. Pierce Xtreme spin column with Low melt (did not work with standard 

agarose)). 

4. Ligate into pRAM [+/r] backbone: (See ligation and transformation 
protocol above) 

Large scale preparation of pRAM backbone 

15 1 . Streak an LB amp plate with pRAMMCS [+/r] (This vector contains a 

synthetic insert with a SacII site in place of a gene. It can be found in 
-70°C in box listed pRAM glycerol stocks position hi. This vector 
contains the new ribosome binding site, but it will be cut out when the 
vector is digested with Sfil. 

20 2. Prepare a 10 ml overnight culture in LB supplemented with amp. 

3. The next day inoculate IL of LB supplemented with amp and grow for 
16-20 hours. 

4. Purify the DNA with the Wizard Maxi Prep kit. (use 4 preps for 1 L of 
cells) 

25 5. Digest the Plasmid with Sfil. (Use 5U per microgram) Overlay with 

mineral oil and digest for at least two hours, 

6. Ethanol Precipitate to remove salts. Resuspend in water. 

7. Digest with Sacll for 2 hours, (keep digest volume to 2 ml or less). 

It is possible that part of the plasmid could be partially digested. If the 
30 vector is cut with an enzyme that is internal to the two Sfil sites, it will 
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keep the partially digested fragments from joining in a ligation 
reaction. 

8. Load entire digest onto a column (see 9). The volume of the sample 
load should not be mor^ than 2 ml. If it is it will be necessary to 

5 ethanol precipitate. 

9. The column contains Sephacryl s-1000 and is stored with 20% ethanol 
to prevent bacterial contamination. Prior to loading the sample the 
column must be equilibrated with cold running buffer for at least 24 
hours. If the column has been sitting more than a couple of months it 

10 may be necessary to empty the column, equilibrate the resin 3-4 

washes in cold running buffer, and then re-pour the column. Ai\er the 
column is poured it should be equilibrated overnight so that the resin 
is completely packed. 

10. Collect fractions of ~0.5ml. Typically the DNA comes off between 
15 fractions 25 and 50. Load a five \i\ aliquot from a range of fractions 

to determine which fractions contain the backbone fragment. The 
small insert fragment will start to come off the column before all of 
the backbone is eluted, so it will be necessary to be conservative when 
fractions are pooled. For this reason typically 40-60% of the DNA is 
20 lost at this step. 

1 1 . Pool the fractions that contain the backbone. 

12. Ethanol precipitate the seimples. Resuspend in a volume that produces 
--10-50 ng/ til. 

13. Store at -70°C. 

25 

Column running buffer: (store at 4**C) 
5 mM EDTA 
100 mM NaCl 
50 mM Tris-HCL pH 8.0 
3D 10 ^g/ml tRNA (R-8759) 
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H. Oligonucleotide Mutagenesis: 

Prepare Ampicillin-sensitive Single stranded DNA of the tenDplate to be mutated. 
Design a mutagenic primer that will randomly generate all possible amino acid 
5 codons. 

Mutagenesis reaction: 



Component 


Final concentration 


Single Stranded Template 


O.OSpmol 


Mutagenic Oligo 


1.25pmol 


AmpiciUin Repair Oligo (Promega q631a) 


0.25pmol 


lOX annealing buffer 


IX 


Water to 20 ul 




*Annealing buffer: 

-200mM Tris-HCl, pH 7.5 
-lOOmM MgC12 
-500mM NaCl 




Heat reaction at 60°C for 15 minutes and then immediately place on ice. 
Synthesis reaction: 


Component 


Amount 


Water 


5ul 


lOX synthesis buffer 


3ul 


T4 DNA Polymerase (Promega m421a) 


1 ul (10 Units) 


T4 DNA Ligase (Promega 180a) 


1 ul (3 Units) 



♦Synthesis buffer 

lOOmM Tris-HCl, pH7.5 
5mM dNTPs 
20 lOmM ATP 

20mM DTT 
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Incubate at 37C for 90 minutes. 

Transform into Mut-S strain BMH 71-18 (Promega strain Q6321) 

-Place Synthesis reaction in a 17X1 00mm tube. 

-Add BMH 71-18 competent cells that have been thawed on ice to 
5 synthesis reaction, 

-Incubate on ice for 30 min 

-Heat Shock cells at 42°C for 90 seconds. 

-Add 4 ml of LB medium and grow cells at 37C for 1 hour. Add 
Ampicillin to a final concentration of 1.25ug/ml and then grow overnight at 37°C. 
10 Isolate DNA with Wizard Plus Purification system (Promega a7100) 

Transform isolated DNA into JM109 electro-competent cells and transform 
onto LB Ampicillin plates. 

L Screening procedure: 

JM109 clones (from a transformation reaction) are plated onto 
15 nitrocellulose filters placed on LB amp plates at a screening density of --500 

colonies per plate. 

As listed in the Random Mutagenesis procedure, approximately 10% of the 
clones to be selected will have to be as stable as the same sequenced or better than 
source. Or stated another way, --50 colonies per plate will be suitable for 
20 selection. There are 704 wells available for a full eight plate robotic run, so at 

least 1 5 LB amp plates will be needed for a full robotic run. 

After overnight growth at 37°C the plates contains the transformants are 
removed from the incubator and placed at room temperature. 

The nitrocellulose filter is lifted on one side and 500 \xl of lOmM IPTG is 
25 added to each of the plates. The filter is then placed back onto the plate to allow 

diffusion of the IPTG into the colonies containing the different mutant luciferase 
genes. The plates are then incubated for about 4 hours at room temperature. 

One (1) ml of a solution contains ImM Luciferin and lOOmM Sodium 
Citrate is pipetted onto a slide warmer that is set at 50°C. A nitrocellulose filter 
30 that contains mutant luciferase colonies and has been treated with IPTG is then 
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placed on top of the luciferin solution. After several minutes, the brightest 
colonies are picked with tooth picks which are used to inoculate wells in a 
microtiter plate that contain M9- minimal media with 1% gelatin. 

After enough colonies are pipked to 8 microtiter plates, the plates are 
5 placed in an incubator at 350rpm at 30°C incubation and are grown overnight. 

In the morning the overnight plates are loaded onto the robot and the cell 

dilution procedure is run. (This procedure dilutes the cultures 1:10 into induction 

medium). The new plates are grown for 3 hours at 350rpm at 30°C. 

After growth, the plates are loaded to the robot for the main assay 
10 procedure. 

Minimal Media: 

6g/Liter Na2HP04 
3g/Liter KH2P04 
0.5g/LiterNaCl 
15 lg/LiterNH4Cl 
2mM MgS04 
0.1 mM 

ImM Thiamine-HCl 
0.2% glucose 
20 12ug/ml Tetracycline 

lOOug/ml ampicillin 



*Overnight media contains 1% gelatin 
*Induction media contains ImM IPTG and no gelatin. 
25 S.O.C. Media 

-lOmMNaCl 
-2.5mMKC] 
-20mMMgCJ 
'20mM glucose 
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-2% bactotryptone 
-0.5% yeast extract 
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TABLE 1: Parameters Characterizing Luciferases of Clones Derived for 
Various Experiments 



Control is 
PPE-2 39- 
5B10at51C. 



Experiment 


Clone 
ID 


Li 


tau 


Km 


S 


40 


0a7 


1.04 


4.5 


0.78 


1 


40 


5h4 


1.29 


1.61 


1.16 


0.953 


40 


0c2 


1.13 


1.54 


0.91 


0.998 


40 


5g4 


1 


1.4 


0.85 


1 


40 


6d3 


1.02 


1.37 


0.79 


1 


40 


1g4 


1.06 


1.28 


0.77 


0.985 


40 


1d4 


1.69 


1.23 


0.73 


1 


40 


0h9 


1.26 


1.21 


0.63 


0.998 


40 


2f6 


3 


1.07 


0.49 


0.981 


40 


7d6 


3.09 


1.058 


1.09 


1.013 


40 


5a7 


4.3 


1.025 


0.93 


1.008 


40 


4c8 


1 


1 


0.33 


1.004 


Experiment 


Clone 
ID 


Li 


tau 


Km 


S 


41 


7h7 


0.73 


2.4 


2.1 


0.995 


41 


5a5 


0.77 


1.93 


2.7 


1.002 


41 


2c12 


1.06 


1.7 


0.91 


1.003 


41 


6e5- 


1.16 


1.62 


1.53 


0.997 


41 


4e5- 


1.08 


1.37 


1.4 


1.004 


41 


6g7 


1.3 


1.27 


1.39 


0.999 


41 


1h4 


1.36 


1.24 


0.56 


0.994 


41 


0c11 


4.1 


1.23 


1.24 


0.996 


41 


2h9 


5.3 


1.01 


0.83 


0.986 


42 


6b10 


0.97 


3.6 


0.97 


0.997 


42 


1c3 


0.91 


2.1 


0.6 


0.998 


42 


7h9 


0.8 


1.8 


0.8 


0.982 


42 


6b2 


0.77 


1.72 


0.8 


0.978 


42 


6d6 


0.83 


1.7 


0.733 


0.975 


42 


4e10- 


0.77 


1.63 


1.8 


0.954 


42 


1b5 


0.83 


1.41 


1.05 


0.955 


42 


6e6- 


0.71 


1.16 


0.89 


0.955 


42 


3a9 


0.85 


1.3 


0.86 


0.997 


42 


6b6 


2.7 


1.3 


0.91 


1.02 
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42 


6e9- 


1.5 


1.27 


0.98 


1.01 


42 


3h11 


1.73 


1.21 


0.63 


0.985 


42 


1a2 


1.11 


1.17 


0.77 


1.005 


42 


3f7 


0.49 


1.16 


1.13 


0.944 


42 


1a4 


2 


1.01 


0.76 


0.996 



Control is 
PPE-2 40- 
0A7 at 54C 



Experiment Clone Li tau Km S 
ID 


46 


2h3 


0.86 


6.4 


0.37 


0.96 


46 


4a9 


0.67 


5.7 


0.66 


0.997 


46 


2g4 


0.65 


5.3 


0.78 


0.96 


46 


5d12 


0.94 


4.9 


0.94 


1.002 


46 


Ihll 


1.02 


4.8 


0.84 


0.998 


46 


5a10 


1.23 


4.4 


0.81 


0.9842 


46 


0a8 


1.35 


4.3 


0.89 


1 


46 


4d3 


0.51 


3.6 


0.65 


0.975 


46 


2a3 


1.17 


2.9 


0.57 


0.988 


46 


3b11 


1.39 


2.5 


0.63 


1.02 


46 


7g12 


1.49 


2.5 


0.91 


1.02 


46 


0g9 


1.86 


2.25 


0.5 


0.998 


46 


7h8 


1.07 


1.36 


0.52 


0.99 


46 


Ig8 


0.3 


1.31 


0.72 


0.92 


46 


1d3 


1.74 


1.13 


1.02 


1.001 


46 


0c3 


1.68 


1.01 


0.74 


1.01 


46 


5c11 


0.82 


1.01 


0.6 


0.95 


Control is 
PPE-2 46- 
2h3 at 54. 

Experiment Clone Li tau Km S 
ID 


49 


6c10 


0.57 


2.2 


0.98 


1 


49 


7c6 


1.12 


1.9 


0.93 


1.01 


49 


0g12 


1 


1.58 


0.69 


1.08 


49 


7a5 


1.08 


1.44 


1.1 


0.99 


49 


1f6 


0.66 


1.13 


1.04 


1.006 


49 


0b5 


0.76 


1.07 


1.03 


0.98 


49 


4a3 


0.94 


1.06 


0.77 


1 
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Control is 
PPE-2 49. 
7C6 at 56C 



tzxpennrierii 


Uione 


t ; 
LI 


laU 


r\ITl 


Q 


56 


2d12 


0.97 


2.9 


0.29 


1.006 


56 


5g10 


1.01 


2.77 


0.64 


1.007 


56 


3d5 


1.32 


2.25 


1.85 


1.03 


Experiment 


Clone 
ID 


Li 


tau 


Km 


S 


57 


3d1 


1.06 


2.9 


1.05 


1.02 


57 


6g12 


1 


2.7 


0.87 


1.004 


57 


4c1 


0.79 


2.6 


0.93 


1.014 


57 


5f10 


0.72 


1.9 


0.64 


1.03 


57 


1e6- 


0.84 


1.49 


0.984 


0.9871 


57 


1h2 


0.94 


1.43 


0.68 


0.991 


57 


2a6 


1.08 


1.08 


0.89 


0.9976 


Experiment 


Lrlone 
in 


Li 


tau 


Km 


S 




igo 


I. Of 




4 TO 
1 .f 0 


1 .02 


58 


Oao 


1.53 


8.5 


1.56 


1.05 


^0 


1h1 
1 u 1 




ft 




1 .U'f 


58 


3g1 


1 


7.34 


0.62 


1.006 


58 


0f3 


1.31 


6.9 


0.57 


0.98 


58 


3e12- 


1.06 


6.3 


0.47 


0.996 


58 


0c7 


1.9 


4 


0.64 


1.06 


58 


Od1 


1.03 


3.76 


0.49 


1.03 


58 


3c7 


1.49 


3.4 


0.55 


1.04 


58 


2a2 


1.4 


2.2 


0.5 


1.05 


58 


2a8 


3.2 


2 


0.81 


1.05 


58 


0f2 


2.2 


1.92 


0.45 


1.04 


58 


1b4 


5.1 


. 1.87 


1.08 


1.09 


58 


2b3 


2.7 


1.55 


0.57 


1.04 


58 


4g1 


4.9 


1.2 


0.72 


1.06 



Control is 
PPE-2 58- 
0A5 at 58C 
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Experiment 


Clone 
ID 


Li 


tau 


Km 


s 


61 


4e9- 


1.03 


1.84 


0.76 


1.01 


61 


1f1 


1.02 


1.43 


0.7 


1 


61 


2e12- 


1.56 


1.34 


0.48 


1.003 


61 


2f2 


1.5 


1.3 


0.32 


1.01 


61 


6b4 


1.2 


1.26 


0.88 


0.98 


61 


4c10 


1.46 


1.12 


1.06 


0.99 


61 


4g11 


1.31 


1.03 


1.43 


1.03 


61 


2f1 


1.41 


1.02 


0.79 


0.995 


61 


2q1 


1.3 


1 


1.17 


1 


ExDeriment 


Clone 
ID 


Li 


tau 


Km 


s 


65 


6a12 


0.87 


2.3 


0.73 


0 9605 


65 


1h6 


0.84 


2.2 


1.62 


0.9598 


65 


7f5 


1.2 


1.56 


2.07 


1 .0087 


65 


5a5 


2.3 


1.49 


0.45 


0 9985 


65 


7h2 


1.56 


1.27 


0.91 


1 .0658 


65 


7b2 


1.98 


1.16 


0.6 


0.9289 


65 


0g9 


1.36 


1.09 


1.46 


0.9927 


65 


6c7 


1.48 


1.06 


0.86 


0.9967 


65 


1e12- 


1.59 


1.05 


1.03 


0.9582 


65 


4e2- 


1.21 


1.05 


1.11 


0.943 


65 


6a10 


1.7 


1.04 


0.93 


0.992 


65 


4b9 


1.48 


1.04 


1.61 


1 .0009 


65 


6c1 


1.36 


1.02 


0.72 


0.9978 



Experiment 


Clone 
ID 


Li 


tau 


Km 


S 


68 


2g6 


1.39 


3.9 


1.17 


0.9955 


68 


4g3 


2 


2.5 


0.27 


0.9927 


68 


5a3 


1.04 


1.64 


0.65 


0.8984 


68 


2b7 


1.04 


1.64 


5.2 


0.9237 


68 


5cl10 


2.75 


1.36 


0.73 


1 .0078 


68 


7d12 


1.85 


1.32 


0.66 


1.0084 


68 


7b9 


1.8 


1.19 


0.56 


1 .0052 


68 


7b3 


1.2 


1.16 


0.55 


0.9951 


68 


igio 


1.48 


1.05 


1.22 


1.0025 


Experiment 


Clone 


Li 


tau 


Km 


S 



65 



wo 99/14336 



PCT/US98/I9494 



ID 



70 


2a7 


1.94 


4.6 


0.7 


1.0015 


70 


3d6 


3.5 


4.2 


0.18 


1.03 


70 


4f8 


1.87 


4.2 


0.69 


0.9979 


70 


7h5 


2.4 


2.6 


0.18 


1 


70 


5h6 


3.1 


2.3 


0.6 


0.999 


70 


7d6 


3 


2.2 


2.29 


0.9989 


70 


5a3 


3.1 


1.5 


0.18 


1 .0058 


70 


7d2 


2.5 


1.4 


0.66 


1.0126 


70 


3h7 


3.2 


1.22 


0.23 


1.002 


70 


0h5 


2.5 


1.15 


0.36 


0.9992 


70 


0d7 


1.86 


1 


1.83 


0.993 


70 


1g12 


2.42 


1 


0.26 


0.965 


Experiment Clone Li tau Km S 
ID 


71 


1d10 


1.6 


4.5 


1.06 


1 .0065 


71 


6f11 


1.8 


4.3 


0.98 


0.953 


71 


7h4 


3.4 


3.6 


0.56 


1 .0045 


71 


4h3 


3.1 


3.1 


0.42 


1.0171 


71 


1h5 


1.31 


3.01 


1.31 


0.9421 


71 


5e4- 


5.4 


2.3 


0.35 


0.994 


71 


5c1 


2.2 


2.3 


0.89 


0.9746 


71 


0h7 


3.6 


1.8 


0.59 


1.0197 


71 


6h9 


23.7 


1.71 


0.91 


1 .0064 


71 


7e3- 


5.3 


1.7 


0.7 


1 .0028 


71 


5d4 


11.1 


1.48 


0.35 


1.0213 


71 


2e3- 


4 


1.47 


0.45 


0.9654 


71 


6h11 


17.7 


1.15 


2.8 


1 .0064 


71 


2e10- 


3 


1.1 


0.66 


0.9588 


71 


2g2 


4.4 


1.01 


0.44 


1 .0046 



Control is 
PPE-2 71- 
5D4 at 60C 



Experiment Clone Li tau Km S 
ID 



72 


2g6 


0.38 


3.1 


1.58 


1 .0052 


72 


5f12 


0.81 


1.53 


1.02 


0.9678 


72 


0d7 


0.76 


1.44 


1.4 


0.9838 


72 


5c12 


0.87 


1.43 


1.04 


0.9718 


72 


1e1- 


1.04 


1.41 


1.15 


0.9956 


72 


5b12 


0.83 


1.41 


1.02 


0.9731 



66 



wo 99/14336 



PCT/US98/19494 



72 


Ob7 


1.11 


1.04 


0.91 


1.0049 


72 


3b4 


0.49 


1.03 


2.2 


0.9581 


Experiment 


Clone 
ID 


Li 


tau 


Km 


S 


73 


2h8 


0.85 


1.9 


1.08 


1.0123 


73 


4e6- 


0.95 


1.76 


0.94 


0.9939 


73 


3g8 


0.86 


1.53 


1.04 


1 


73 


1g3 


1.7 


1.14 


0.97 


0.9921 


Experiment 


Clone 
ID 


LI 


tau 


Km 


s 


74 


2ag 


0.96 


1.77 


0.86 


0.999 


74 


4e10- 


0.8 


1.36 


1.33 


0.0989 
7 


74 


0d5 


1.69 


1.28 


0.61 


0.9927 


74 


6g7 


1.75 


1.07 


1.33 


1 .0022 


74 


5d8 


0.46 


1.06 


0.95 


0.899 


74 


5e7- 


1.22 


1.05 


0.87 


0.9977 


74 


6e1- 


1.19 


1.02 


0.96 


0.999 


Experiment 


Clone 
ID 


Li 


tau 


Km 


s 


76 


6c3 


2.3 


6.4 


1.2 


0.9865 


76 


2a9 


0.93 


4.7 


1.08 


0.999 


76 


3ti9 


1.26 


2.6 


1.02 


0.9973 


76 


Ob10 


1.52 


2.4 


1.4 


0.992 


76 


0h9 


1.71 


1.44 


1.05 


1.018 


76 


2e9- 


0.44 


1.15 


1.2 


0.9318 


76 


Oe10- 


1.67 


1.1 


1.02 


1.014 


76 


Oc10 


1.13 


1.05 


1 


0.9974 


76 


3e8- 


1.35 


1.03 


1.1 


0.9894 


76 


0d12 


0.69 


1 


0.92 


0.932 


76 


Of 10 


0.62 


1 


1.2 


0.9478 


Experiment 


Clone 
ID 


Li 


tau 


Km 


S 


78 


1e1- 


0.54 


8.9 


1.15 


0.9877 


78 


Oh7 


1.4 


5 


0.97 


1.014 


78 


0a6 


1 


4.3 


1.5 


0.9967 


78 


OblO 


1.93 


2 


1 


0.9926 


78 


Of 11 


1.6 


2 


0.91 


0.9905 


78 


3f1 


2.4 


1.7 


1.09 


0.9936 


78 


2b4 


1.97 


1.36 


0.98 


1.0094 


78 


5b3 


3.2 


1.19 


1.03 


0.9735 
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7R 






1 03 


1 


1 ni34 

1 ,\J I OH 


7R 




1 R 


1 

1 


1 1*1 




III KJl lO 

PPE-2 78- 
0B10 at 62C 
Experiment 


Clone 
ID 


Li 


tail 


Km 


s 


82 


2g12 


0.981 1 


2.09 


0.8851 


0.9939 


82 


4b9 


1 .0845 


1.8419 


0.8439 


1 .0078 


82 


Odi 


0.7622 


1.5171 


1.11 


0.9998 


82 


3g1 


0.8805 


1.504 


0.9629 


0.9927 


82 


1d1 


0.9741 


1.4497 


0.8936 


0.9986 


82 


1e8- 


0.8206 


1 .4433 


0.9876 


0.9968 


82 


0h9 


1.1355 


1 .3626 


0.9171 


1 .0094 


82 


2c6 


1 .0931 


1 .3402 


0.9482 


1 .0022 


82 


3a9 


1 .0364 


1.251 


0 968 


1 0009 


82 


4h8 


0.8816 


1.1667 


0.9165 


1 .0045 


82 


OalO 


1 0535 


1 112R 


1 0413 


1 


82 


4g1 


1 .4305 


1 .0862 


1.1734 


1.0059 


Experiment 


Clone 
ID 


Li 


tau 


Km 


S 




fih? 


\J.O f sJ\J 


Q 


/..OQOO 






2h9 


0 4264 


'28 795 
8 


1 81Q 




84(121) 


3f7 


0 4161 


25 305 
8 


1 8079 


0 8988 


84(121) 


2h10 


0.9667 


14.465 
8 


0.8073 


0.9947 


84(121) 


3a2 


0.3329 


12.6 


2.5444 


0.855 


84(121) 


3a6 


1.2299 


7.2384 


0.7866 


1.0046 


84(121) 


5b12 


1.0535 


6.0315 


0.7824 


1 .0056 


84(121) 


5a7 


1.0413 


4.9054 


0.8864 


1.0071 


84(121) 


3d2 


0.2032 


4.8 


2.4623 


0.7973 


84(121) 


2a9 


1.0847 


4.7486 


0.7746 


1 .0051 


84(121) 


5e11- 


1.1918 


4.0988 


0.872 


1.008 


84(121) 


7h2 


0.9116 


3.9929 


0.909 


1 .0077 


84(121) 


3b5 


1.2014 


3.8251 


0.7509 


1.0086 


84(121) 


1f8 


1.07 


3.06 


0.8276 


1 .0093 


84(121) 


2e2- 


1.4356 


1.9315 


0.7863 


1.0175 



Control is 



PPE-2 84- 
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3a6 at 64C 

Experiment Clone Li tau Km S 

ID 



85(86) 


2a2 


0.2266 


12.901 

3 


3.326 


0.8705 


85(86) 


4f12 


1.1167 


4.7851 


0.7439 


1.0092 


85(86) 

^ 1_ 


4e9- 


1 .0869 


4.4953 


0.8539 


1.0068 


85(86) 


1f11 


0.6994 


4.0976 


0.842 


1.0124 


85(86) 


5a4 


1 .2273 


4.09 


0.9683 


1.0098 


85(86) 


3e10- 


0.8902 


3.5342 


0.8106 


1.0069 


85(86) 


3e12- 


1.0512 


3.4883 


0.853 


1.0054 


85(86) 


5e4- 


0.9562 


3.3886 


1.0328 


1 .0069 


85(86) 


0e6- 


0.1494 


3.0145 


3.6293 


0.8269 


85(86) 


6b1 


0.7615 


2.5712 


0.8695 


1 .0055 


85(86) 


6h7 


1 .0285 


2.5401 


0.8963 


1.0057 


85(86) 


4b11 


0.9816 


2.3899 


0.7927 


1.0063 


85(86) 


6d7 


1.1087 


2.0607 


0.9042 


1.0088 


85(86) 


2e10- 


0.3028 


2.0603 


1.9649 


0.8738 


85(86) 


2a9 


1.448 


1.1819 


0.9722 


1 .0046 


Control is 
PPE-2 85- 
4f12at 65C 

Experiment Clone Li tau Km S 
ID 


88 


3c1 


1 .4439 


2.0938 


0.9874 


0.9976 


88 


6g1 


1.0184 


1 .2665 


1.2184 


1.0019 


88 


3e4- 


1.331 


1 .0996 


1.0669 


0.9983 


Experiment Clone Li tau Km S 
ID 


89 


1a4 


1 .2565 


2.4796 


1.0338 


0.997 


89 


3b1 


0.7337 


1 .9976 


0.9628 


1 .0001 


89 


2b12 


1 .0505 


1.8496 


1.0069 


1.0012 


89 


0b5 


1 .5671 


1.1362 


1.0912 


0.9995 


89 


1f1 


1.378 


1.1018 


0.9804 


0.996 


89 


2f1 


1.4637 


1 .0894 


0.9189 


0.9992 


Experiment Clone Li tau Km S 
ID 


90 


on 


1 .4081 


1 .3632 


1.027 


0.9987 


90 


1b5 


1.4743 


1.1154 


1.0812 


1.0011 


90 


6g5 


1 .2756 


1.0605 


1.0462 


1.0012 


90 


5e6- 


1 .0556 


1 .0569 


1.1037 


1.0011 
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90 


4e3- 


1 .2934 


1 .0291 


1.0733 


1.0002 
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TABLE 2: Stability Of Luciferase Activity At Different Temperatures (Half- 
Life In Hours) 





Room 
Temperature 


57°C 


50°C 


60° 


Luc[T249M] 


110 


' 0.59 


0.01 




49-7C6 


430 


68 


31 


6.3 


78-OBlO 


3000 


220 


47 


15 
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TABLE 3: Michaelis-Menten Constants for Mutants Created by Directed 
Evolution 





Km-luciferin 


K™-ATP 


Luc[T24] 


0.32nM 


18^M 


49-7C6 


0.99nM 


14^M 


78-OBlO 


1.6^M 


3.4^M 


90-1B5 


2.2^M 


3.0nM 
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TABLE 4: 



Components Concentration Amount in 5Q\i Final concentration 



DATP 


10 mM 




0.2mM 


DCTP 


10 mM 




0.2 mM 


DGTP 


10 mM 




0.2 mM 


DTTP 


10 mM 




0.2 mM 


+pRAM18up 


20 pmol/^i 1 




0.4 pmol/jil 


+pRAM19dn 


20 pmol/nl 




0.4 pmol/^il 


PFU 


2U/ul 




0.04 u/^L 


**10x buffer 


lOx 


5 


Ix 


DNA 




10 from purified wiz. 




Water 




24.6 
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TABLE 5: 


Summary of Evolutionary Progression 




O 


Start with LucPpe2[T249M] 






Mutate 3 amino acids at N- and C-termini 


5 


© 


Mutate 7 cysteines 




o 


Perform two iterations of evolution Luc49-7C6 






Mutagenesis of altered codons (9) 




© 


Two iterations of evolution Luc78-0B10 




e 


Mutagenesis of consensus codons (28) 


10 


© 


Mutagenesis of codon usage (24) Luc90-IB5 
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TABLE 6: One Iteration of Recursive Process 

O 1 clone 3 libraries using error-prone PCR 

• 3 X Visual screen (-10,000 clones each) 

• 3 X Quantitative screen (704) clones each) 
© 3x18 clones library using sPCR 

• Visual screen (-10,000 clones) 

• Quantitative screen (704 clones) 
® 18 + 18-^ library using sPCR 

• Visual screen (-10,000 clones) 

• Quantitative screen (704 clones) 
O Output: 1 8 clones 
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WE CLAIM: 

1. A second beetle luciferase with increased thermostability as 
compared with a first luciferase, said second luciferase made by the following 
method: 

a) mutating a polynucleotide sequence encoding the first 
luciferase to obtain a polynucleotide sequence encoding the second luciferase; 

b) selecting the second luciferase if a plurality of characteristics 
including thermostability of a luciferase is in a preferred range. 

2. The second luciferase of claim 1, wherein the polynucleotide 
sequence encoding the first luciferase is the same as the sequence of Luc 
(T249M). 

3. The second luciferase of claim 1, wherein thermostability is at least 
2 hours at about 50°C in aqueous solution. 

4. The second luciferase of claim 3, wherein thermostability is at least 
5 hours at 50*^C in aqueous solution. 

5. The second luciferase of claim 1, wherein the plurality of 
characteristics comprises brightness of luminescence, substrate utilization and 
luminescence signal. 

6. The second luciferase of claim 1, wherein the mutating is by 
directed evolution. 

7. A beetle luciferase that is thermostabile for at least 2 hours at 50°C 
in aqueous solution. 

8. The luciferase of claim 7, that is thermostabile for at least 5 hours at 

50°C. 
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9. The luciferase of claim 7, wherein less than 5% luminescence 
activity is lost after incubation in solution for 2 hours at about 50°C. 

10. A method for preparing a beetle luciferase with increased 
thermostability, said method comprising the following steps: 

a) mutating a polynucleotide sequence encoding a first 
luciferase to obtain a sequence encoding a second luciferase; and 

b) selecting the second luciferase if a plurality of characteristics 
including thermostability of a luciferase are in a preferred range. 

1 1 . The method of claim 1 0, wherein thermostabiity is at least 2 hours 
at 50°C. 

12. The method of claim 11, wherein the thermostability is at least 5 
hours at 50°C. 

13. The method of claim 10, wherein mutating occurs at at least one 
position wherein a consensus amino acid is present in beetle species. 

14. The method of claim 10, wherein mutating occurs at at least one 
position where a mutation occurred to produce the luciferase gene designated 
Iuc90'lB5. 

1 5. A DNA molecule having a nucleotide sequence that encodes a 
mutant luciferase with increased thermostablility as compared to the 
thermostability of a native luciferase. 

16. The DNA molecule of claim 15, wherein the nucleotide sequence is 
selected from the group consisting of sequences. 
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GGATCCAATGGAAGATAAAT^TATTTTATATGGACCTGAACCATTTTATCCCTTGGCTGA 
TGGGACGGCTGGAGAACAGATGTTTTACGCATTATCTCGTTATGCAGATATTTCAGGATG 
CATAGCATTGACAAATGCTCATACAAAAGAAAATGTTTTATATGAAGAGTTTTTAAAATT 
GTCGTGTCGTTTAGCGGAAAGTTTTAAAAAGTATGGATTAAAACAAAACGACACAATAGC 
GGTGTGTAGC GAAAATGGTTTGCAATTTTTCCTT CCTATAATTGCATCATTGTATCTT GG 
AATAATTGCAGCACCTGTTAGTGATAAATACATTGAACGTGAATTAATACACAGTCTTGG 
TATTGTAAT^CCACGCATAATTTTTTGCTCCAAGAATACTTTTCAAAAAGTACTGAATGT 
AAAATCTAAATTAAAATATGTAGAAACTATTATTATATTAGACTTAAATGAAGACTTAGG 
AGGTTATCAATGCCTCAACAACTTTATTTCTCAAAATTCCGATATTAATCTTGACGTAAA 
AAAATT TAAAC CATATTC TTTTAATC GAGACGATCAGGTTGC GTTGGTAATGTT TTCTTC 
TGGTACAACTGGTGTTTCGAAGGGAGTCATGCTAACTCACAAGAATATTGTTGCACGATT 
TTCTCTTGCAAAAGATCCTACTTTTGGTAACGCT^TTAATCCAACGACAGCAATTTTAAC 
GGTAATACCTTTCCACCATGGTTTTGGTATGATGACCACATTAGGATACTTTACTTGTGG 
ATTCCGAGTTGTTCTAATGCACACGTTTGAAGAAAAACTATTTCTACAATCATTACAAGA 
TTATAAAGTGGAAAGTACTTTACTTGTACCAACATTAATGGCATTTCTTGCAAAAAGTGC 
AT TAGTTGAAAAGTAC GATTTATC GCACTTAAAAGAAATTGCAT C TGGTGGC GCACCTTT 
AT CAAAAGAAATTGGGGAGATGGTGAAAAAAC GGTTTAAATTAAACTTTGTC AGGCAAGG 
GTATGGATTAACAGAAACCACTTCGGCTGTTTTAATTACACCGAACAATGACGTCAGACC 
GGGATCAACTGGTAAAATAGTACCATTTCACGCTGTTAAAGTTGTCGATCCTACAACAGG 
AAAAATTTTGGGGCCAAATGAAACTGGAGAATTGTATTTTAAAGGCGACATGATAATGAA 
AGGTTATTATAATAATGAAGAAGCTACTAAAGCAATTATTAACAAAGACGGATGGTTGCG 
CTCTGGTGATATTGCTTATTATGACAATGATGGCCATTTTTATATTGTGGACAGGCTGAA 
GTCATTAATTAAATATAAAGGTTATCAGGTTGCACCTGCTGAAATTGAGGGAATACTCTT 
ACAACATCCGTATATTGTTGATGCCGGCGTTACTGGTATACCGGATGAAGCCGCGGGCGA 
GCTTCCAGCTGCAGGTGTTGTAGTACAGACTGGAAAATATCTAAACGAAC7VAATCGTACA 
AAATTTTGTTTCCAGTCAAGTTTCAACAGCCAAATGGCTACGTGGTGGGGTGAAATTTTT 
GGATGT^TTCCCAAAGGATCAACTGGAAAAATTGACAGAAAAGTGTTAAGACAAATGTT 
TGAAAAACACACCAATGGG* 



GGATCCAATGGAAGATAAAAATATTTTATATGGACCTGAACCATTTTATCCCTTGGCTGA 
TGGGACGGCTGGAGAACAGATGTTTTACGCATTATCTCGTTATGCAGATATTTCAGGATG 
CATAGCATTGACA/aATGCTCATACAAAAGAAAATGTTTTATATGATVGAGTTGTTAAAATT 
GTCGTGTCGTTTAGCGGAAAGTTTTAAAAAGTATGGATTAAAACAAAACGACACAATAGC 

ggtgtgtagcga;\aatggtttgcaatttttccttcctataattgcatcattgtatcttgg 
aataattgcagcacctgttagtgataaatacattgaacgtgaattaatacacagtcttgg 
tattgtaaaaccacgcataattttttgctccaagaatacttttcaaaaagtactgaatgt 
aaaatctaaattaaaatatgtagaaactattattatattagacttaaatgaagacttagg 
aggttatcaatgcctcaacaactttatttctcaaaattccgatattaatctggacgtaaa 
aaaatttaaaccatattcttttaatcgagacgatcaggttgcgttggtaatgttttcttc 
tggtacaactggtgtttcgt^gggagtcatgctaactcacaagaatattgttgcacgatt 
ttctcatgcaaaagatcctacttttggtaacgcaatttuvtccaacgacagcaattttaac 
ggtaatacctttccaccatggttttggtatgatgaccacattaggatactttacttgtgg 
attccgagttgttctaatgcacacgtttgaagaaaaactatttctacaatcattacaaga 
ttataaagtggaaagtactttacttgtaccaacattaatggcattttttgcaaaaagtgc 

ATTAGTTGAAAAGTACGATTTATCGCACTTAAAAGAAATTGCATCTGGTGGC GCACCTTT 

atcaaaagaaattggggagatggtgt^aaaaacggtttaaattat^ctttgtcaggcaagg 
gtatggattaacagaaaccacttcggctgttttaattacaccgaacaatgacgtcagacc 
gggatcaactggtaaaatagtaccatttcacgctgttaaagttgtcgatcctacaacagg 
aaaaattttggggccaaatgaaactggagaattgtattttaa7vggcgacatgataatgaa 
aggttattataataatgaagaagctactaaagcaattattaacaaagacggatggttgcg 
ctctggtgatattgcttattatgacaatgatggccatttttatattgtggacaggctgaa 
gtcattaattaaatataaaggttatcaggttgcacctgctgaaattgagggaatactctt 
acaacatccgtatattgttgatgccggcgttactggtataccggatgaagccgcgggcga 
gcttccagctgcaggtgttgtagtacagactggaaaatatctaaacgaacaaatcgtaca 
aaattttgtttccagtcaagtttct^cagccaaatggctacgtggtggggtgaaattttt 

ggatgaaattcccaaaggatcaactggaaaaattgacagaatuvgtgttaagacaaatgtt 
tgaaaaacacaccaatggg* 
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^3 

GGATCCAATGGAAGATAAAAATATTTTATATGGACCTGAACC ATTTTATC CCTTGGCT GA 
TGGGACGGCTGGAGAACAGATGTTTTACGCATTATCTCGTTATGCAGATATTTCAGGATG 
CATAGCATTGACAAATGCTCATACAAAAGAAAATGTTTTATATGAAGAGTTTTTAAAATT 
GTCGTGTCGTTTAGCGG?VAAGTTTTAAAAAGTATGGATTAAAACAAAACGACACAATAGC 
GGTGTGTAGC GAAAATGGTT TGCAATTTTT CC TT CCTATAATTGC ATCATTGTATCTTGG 
AATAATTGCAGCACCTGTTAGTGATAAATACATTGAACGTGAATTAATACACAGTCTTGG 
TATTGTAAAACC AC GC ATAATTTTTTGCTC CAAGAATACT TTTCAAAAAGTACTGAATGT 
AAAATCTAAATTAAAATATGTAGAAACTATTATTATATTAGACTTAAATGAAGACTTAGG 
AGGTTATCAATGCCTCAACAACTTTATTTCTCAAAATTCC GATATTAATC TT GAC GTAAA 
AAAATTTAAACCATAT TC TTTT/^ATC GAGAC GATCAGGTTGC GTTGGTAATGTT TT CTTC 
TGGTACAACTGGTGTT TC GAAGGGAGTC AT GCTAACTC ACAAGAATATTGTTGTACGATT 
TTCT CTTGCAAAAGATCC TACTTTTGGTAACGCAATTAATCCAAC GACAGCAATTTTAAC 
GGTAATACCTTTCCACCATGGTTTTGGTATGATGACCACATTAGGATACTTTACTTGTGG 
ATTCCGAGTTGTTCTTVATGCACACGTTTGT^GAAAAACTATTTCTACAATCATTACAAGA 
TTATAAAGTGGAAAGTACTTTACTTGTACCAACATTAATGGCATTTCTTGCAAAAAGTGC 
ATTAGTTGAAAAGTAC GATTTATCGC ACTTAAAAGAAATTGCATC TGGTGGC GC AC CTTT 
ATCAAAAGAAAT TGGGGAGATGGTGAAAAAAC GGTT TAAATTAAACTTTGTC AGGC AAGG 
GTATGGATTAAC AGAAAC CACTTCGGCT GTTT TAAT TACACC GAACAATGAC GTCAGACC 
GGGATCAACTGGTAAAATAGTACCATTTCACGCTGTTAAAGTTGTCGATCCTACAACAGG 
AAAAATTTTGGGGCCAAATGAAACTGGAGAATTGTATTTTAAAGGCGACATGATAATGAA 
AGGTTATTATAATAATGAAGAAGCTACTAAAGCAATTATTACCAAAGACGGATGGTTGCG 
CTCTGGTGATATTGCTTATTATGACAATGATGGCCATTTTTATATTGTGGACAGGCTGAA 
GTCATTAATTAAATATAAAGGTTATCAGGTTGCACCTGCTGAAATTGAGGGAATACTCTT 
ACAACATCCGTATATTGTTGATGCCGGCGTTACTGGTATACCGGATGAAGCCGCGGGCGA 
GCTTCCAGCTGCAGGTGTTGTAGTACAGACTGGAAAATATCTAAACGAACAAATCGTACA 
AAATTTTGTTTC CAGTCAAGTTTC AACAGCCAAATGGC TACGTGGTGGGGTGAAATTTTT 
GGATGAAATTCCCAAAGGATCAACTGGAAAAATTGACAGAAAAGTGTTAAGACAAATGTT 




•GAAAAACACACCAATGGG* 



GGATCCAATGGAAGATAAAAATATTTTATATGGACCTGAACCATTTTATCCCTTGGCTGA 
TGGGACGGCTGGAGAACAGATGTTTTAC GCATTATCTC GTTATGCAGATATTTC AGGATG 
CATAGCATTGACAAATGCTCATACAAAAGAAAATGTTTTATATGAAGAGTTTTTAAAATT 
GTCGTGTCGTTTAGCGGAAAGTTTTAAAAAGTATGGATTAAAACAAAACGACACAATAGC 
GGTGTGTAGC GAAAATGGTTTGCAATTTTTCCTTCCTATAATTGCATCATTGTATCTTGG 
AATAATTGCAGCACCTGTTAGTGATAAATACATTGAACGTGAATTAATACACAGTCTTGG 
TATTGTAAAACCACGCATAATTTTTTGCTCCAAGAATACTTTTCAAAAAGTACTGAATGT 
AAAATCTAAATTAAAATATGTAGAAACTATTATTATATTAGACTTAAATGAAGACTTAGG 
AGGTTATCAATGCCTCAACAACTTTATTTCTCAAAATTCCGATATTAATCTTGACGTAAA 
AAAATTTAAACCATATTCTTTTAATCGAGACGATCAGGTTGCGTTGGTAATGTTTTCTTC 
TGGTACAACTGGTGTTTCGAAGGGAGTCATGCTAACTCACAAGAATATTGTTGCACGATT 
TTCTATTGCAAAAGATCCTACTTTTGGTAACGCAATTAATCCAACGACAGCAATTTTAAC 
GGTAATACCTTTCCACCATGGTTTTGGTATGATGACCACATTAGGATACTTTACTTGTGG 
ATTCCGAGTTGTTCTAATGCACACGTTTGAAGAAAAACTATTTCTACAATCATTACAAGA 
TTATAAAGTGGAAAGTACTTTACTTGTACCAACATTAATGGCATTTCTTGCAAAAAGTGC 
ATTAGTTGAAAAGTACGATTTATCGCACTTAAAAGAAATTGCATCTGGTGGCGCACCTTT 
ATCAAAAGAAATTGGGGAGATGGTGAAAAAACGGTTTAAATTAAACTTTGTCAGGCAAGG 
GTATGGATTAACAGAAACCACTTCGGCTGTTTTAATTACACCGAACAATGACGTCAGACC 
GGGATCAACTGGTAAAATAGTACCATTTCACGCTGTTAAAGTTGTCGATCCTACAACAGG 
AAAAATTTTGGGGCCAAATGAAACTGGAGAATTGTATTTTAAAGGCGACATGATAATG/A 
AGGTTATTATAATAATGAAGAAGCTACTAAAGCAATTATTAACA7VAGACGGATGGTTGCG 
CTCTGGTGATATTGCTTATTATGACAATGATGGCCATTTTTATATTGTGGACAGGCTGAA 
GTCATTAATTAAATATAAAGGTTATCAGGTTGCACCTGCTGAAATTGAGGGAATACTCTT 
ACAACATCCGTATATTGTTGATGCCGGCGTTACTGGTATACCGGATGAAGCCGCGGGCGA 
GCTTCCAGCTGCAGGTGTTGTAGTACAGACTGGAATVATATCTAAACGAACAAATCGTACA 
AAATTTTGTTTCCAGTCAAGTTTCAACAGCCAAATGGCTACGTGGTGGGGTGAAATTTTT 
GGATGAAATTCCCAAAGGATCAACTGG7VAAAATTGACAGAAAAGTGTTAAGACAAATGTT 
TGAAAAACACACCAATGGG* 
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Jl 



GGATCCAATGG7VAGATAAAAATATTTTATATGGACCTGAACCATTTTATCCCTTGGCTGA 
TGGGACGGCTGGAGAACAGATGTTT6ACGCATTATCTCGTTATGCAGATATTTCAGGATG 
CATAGCATTGACAAATGCTCATACAAAAGAAAATGTTTTATATGAAGAGTTTTTATyVATT 
GTCGTGTCGTTTAGCGGAAAGTTTTAAAAAGTATGGATTAAAACAAAACGACACAATAGC 
GGTGTGTAGCGAATyVTGGTTTGCAATTTTTCCTTCCTATAATTGCATCATTGTATCTTGG 
AATAATTGCAGCACCTGTTAGTGATAAATACATTGAACGTGAATTAATACACAGTCTTGG 
TATTGTAAAACCACGCATAATTTTTTGCTCCAAGAATACTTTTCAAAAAGTACTGAATGT 
AAAATCTAAATTAAAATATGTAGAAACTATTATTATATTAGACTTAAATGAAGACTTAGG 
AGGTTATCAATGCCTCAACAACTTTATTTCTCAAAATTCCGATATTAATCTTGACGTAAA 
AAAATTTAAACCATATTCTTTT7VATC GAGACGATCAGGTTGC GTTGGTAATGTTTTCTTC 
TGGTACAACTGGTGTTTCGAAGGGAGTCATGCTAACTCACAAGAATATTGTTGCACGATT 
TTCTCATGCAAAAGATCCTACTTTTGGTT^CGCAATTAATCCAACGACAGCAATTTTAAC 
GGTAATACCTTTCCACCATGGTTTTGGTATGAtGACCACATTAGGATACTTTACTTGTGG 
ATTCCGAGTTGTTCTAATGCACACGTTTGAAGAAAAACTATTTCTACAATCATTACAAGA 
TTATAAAGTGGAAAGTACTTTACTTGTACCAACATTAATGGCATTTTTTGCAAAAAGTGC 
ATTAGTTGAAAAGTACGATTTATCGCACTTAAAAGAAATTGCATCTGGTGGCGCACCTTT 
ATCAAAAGAAATTGGGGAGATGGTGAAAAAACGGTTTAAATTAAACTTTGTCAGGCAAGG 
GTATGGATTAACAGAAACCACTTCGGCTGTTTTAATTACACCGAACAATGACGTCAGACC 
GGGATCAACTGGTAAAATAGTACCATTTCACGCTGTTAAAGTTGTCGATCCTACAACAGG 
AAAAATTTTGGGGCCAAATGAAACTGGAGAATTGTATTTTAAAGGCGACATGATAATGAA 
AGGTTATTATAATAATGAAGAAGCTACTAAAGCAATTATT/VACAAAGACGGATGGTTGCG 
CTCTGGTGATATTGCTTATTATGACAATGATGGCCATTTTTATATTGTGGACAGGCTGAA 
GTCATTAATTAAATATAAAGGTTATCAGGTTGCACCTGCTGAAATTGAGGGAATACTCTT 
ACAACATCCGTATATTGTTGATGCCGGCGTTACTGGTATACCGGATGAAGCCGCGGGCGA 
GCTTCCAGCTGCAGGTGTTGTAGTACAGACTGGAAAATATCTAAACGAACAAATCGTACA 
AAATTTTGTTTCCAGTCAAGTTTCAACAGCCAT^TGGCTACGTGGTGGGGTGAAATTTTT 

GGATGAAATTCCCAAAGGATCAACTGGAAAAATTGACAGAAAAGTGTTAAGAC7VAATGTT 
TGAAAAACACACCAATGGG* 
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GGAT CC AATGGCAGATAAAAATATTTTATATGGGCCCGAACCATTTTATC CC TTGGCTGA 
TGGGACGGCTGGAGAACAGATGTTTGACGCATTATCTCGTTATGCAGATATTTCAGGATG 
CATAGCATTGACAAATGCTCATACT^AAAGAAAATGTTTTATATGAAGAGTTTTTAAAATT 
GTCGTGTCGTTTAGCGGAAAGTTTTAAAAAGTATGGATTAAAACAAAACGACACAATAGC 
GGTGTGTAGCGAAAATGGTTTGCAATTTTTCCTTCCGTAATTGCATCATTGTATCTTGGA 
ATAATTGCAGCACCTGTTAGTGATAAATACATTGAACGTGAATTAATACACAGTCTTGGT 
ATTGTAAAAC CACGCATAATTTTTTGCT CCAAGAATACTTTTCAAAAAGTAC TGAATGTA 
AAATCTAAATTAAAATCTGTAGAAACTATTATTATATTAGACTTAAATGAAGACTTAGGA 
GGTTATCAATGCCTCAACAACTTTATTTCTCAAAATTCCGATATTAATCTTGACGTAAAA 
AAAT TTAAAC CATATTCTTT TAATCGAGACGATCAGGTTGCGTTGGTAATGTTTTCTTCT 
GGTACAACTGGTGTTTCGAAGGGAGTCATGCTAACTCACAAGAATATTGTTGCACGATTT 
TCTCTTGCAAAAGATCCTACTTTTGGTAACGCAATTAATCCCACGACAGCAATTTTAACG 
GTAATACCTTTCCACCATGGTTTTGGTATGAtgACCACATTAGGATACTTTACTTGTGGA 
TTCCGAGTTGTTCTAATGCACACGTTTGAAGA7Sl?lAACTATTTCTAC7\ATCATTACAAGAT 
TATAAAGTGGAAAGTACTTTACTTGTACCAACATTAATGGCATTTCTTGCAAAAAGTGCA 

ttagttgaaaagtacgatttatcgcacttaaaagaaattgcatctggtggcgcaccttta 

TCAAAAGTVAATTGGGGAGATGGTGAAAAAACGGTTTT^AATTAAACTTTGTCAGGCAAGGG 
TATGGATTAACAGAAACCACTTCGGCTGTTTTAATTACACCGAAAxxxxxxGCCAGACCG 

ggatcaactggtaflaatagtaccatttcacgctgttaaagttgtcgatcctacaacagga 
aaaattttggggccaaatgaacctggagaattgtattttaaaggcgccatgataatgaag 

GGTTATTATAATAATGAAGAAGCTACTAAAGCAATTATTGATAATGACGGATGGTTGCGC 
TCTGGT GATATTGC TTATTATGACAATGATGGCC ATTTTTAT ATTGTGGACAGGCTGAAG 
TCATTAATTAAATATAAAGGTTATCAGGTTGCACCTGCTGAAATTGAGGGAATACTCTTA 
C AACATC CGT ATAT TGTTGATGCC GGCGTT AC TGGTATTC CGGATGAAGC CGCGGGCGAG 
C TTCCAGCTGCAGGTGTTGTAGTACAGACT GGAAAATATCTAAAC GAACAAATC GTAC AA 
GATTTTGTTTCCAGTCAAGTTTCAACAGCCAAATGGCTACGTGGTGGGGTGAAATTTTTG 
.GATGAAATTCCC;^AAGGATCAACTC>GAAAAATTGACAGAA;\AGTGTTAAGACAPlATGTTT 
GAAAAACACACCAATGGG* 
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t 



J 



GGATCC7ATGGCAGATAAAAATATTTTATATGGGCCCGAACCATTTTATCCCTTGGCTGA 
TGGGACGGCTGGAGAACAGATGTTTTACGCATTATCTCGTTATGCAGATATTTCAGGATG 
CATAGCATTGACAAATGCTCATACAAAAGAAAATGTTTTATATGAAGAGTTTTTAAAATT 
GTCGTGTCGTTTAGCGGAAAGTTTTAAAAAGTATGGATTAAAACAAAACGACACAATAGC 
GGTGTGTAGCGAAAATGGTTTGCAATTTTTCCTTCCTGTAATTGCATCATTGTATCTTGG 
AATAATTGCAGCACCTGTTAGTGATAAATACATTGAACGTGAATTAATACACAGTCTTGG 
TATTGTAAAACCACGCATAATTTTTTGCTCCAAGAATACTTTTCA7WUVGTACTG7\ATGT 
AAAATCTAAATTAAAATATGTAGAAACTATTATTATATTAGACTTAAATGAAGACTTAGG 
AGGTTATC AATGCCTCAACAACTTTATTTC TCAAAATTCCGATATTAATC TTGACGTAAA 
AAAATTTAAACCATATTCTTTTAATCGAGACGATCAGGTTGCGTTGGTAATGTTTTCTTC 
TGGTACAACTGGTGTTCC GAAGGGAGTCATGC TAAC TCACAAGAATATTGTTGCACGATT 
TTCTCTTGCAAAAGAT CC TACTTT TGGTAACGCAATTAATCCAACGAC AGCAATTTTAAC 
GGTAATACCTTTCCACCATGGTTTTGGTATGATGACCACATTAGGATACTTTACTTGTGG 
ATTCCGAGTTGTTCTAATGCACACGTTTGAAGAAAAACTATTTCTACAATCATTACAAGA 
TTATAAAGTGGAAAGTACTTTACTTGTACCAACATTAATGGCATTTCTTGCAAAAAGTGC 
ATTAGTTGTVAAAGTACGATTTATCGCACTTAAAAGAAATTGCATCTGGTGGCGCACCTTT 
ATCAAAAGAAATTGGGGAGATGGTGAAAAAACGGTTTAAATTAAACTTTGTCAGGCAAGG 
GTATGGAT TAAC AGAAAC CACTTC GGCTGTTTTAATTACACC GAAAxxxxxxGTCAGACC 
GGGATCAACTGGTAAAATAGTACCATTTCACGCTGTTTVAAGTTGTCGATCCTACAACAGG 
AAAAATTTTGGGGCCAAATGAACCTGGAGAATTGTATTTTAAAGGCGACATGATAATGAA 
AGGTTATTATAATAATGAAGAAGCTACTAAAGCAATTATTGtATAAAGACGGATGGTTGCG 
CTCTGGTGATATTGCTTATTATGACAATGATGGCCATTTTTATATTGTGGACAGGCTGAA 
GTCATTAATTAAATATAAAGGTTATCAGGTTGCACCTGCTGAAATTGAGGGAATACTCTT 
ACAACATCCGTATATTGTTGATGCCGGCGTTACTGGTATACCGGATGAAGCCGCGGGCGA 
GCTTCCAGCTGCAGGTGTTGTAGTACAGACTGGAAAATATCTAAACGAACAAATCGTACA 
AAATTTTGTTTCCAGTCAAGTTTCAACAGCCAAATGGCTACGGGGTGGGGTGAAATTTTT 
GGATGAAATTCCCAAAGGATCAACTGGAAAAATTGACAGAAAAGTGTTAAGACAAATGTT 
TGAAAAACACACCAATGGG* 
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-I) 

GGATCCAATGGCAGATAAAAATATTTTATATGGGCCCGAACCATTTTATCCCTTGGCTGA 
TGGGACGGCTGGAGAACAGATGTTTGACGCATTATCTCGTTATGCAGATATTCCCGGATG 
CATAGCATTGACAAATGCTCATACAAAAGAAAATGTTTTATATGAAGAGTTTTTAAAATT 
GTCGTGTCGTTTAGCGGAAAGTTTTAAAAAGTATGGATTAAAACAAAACGACACAATAGC 
GGTGTGTAGCGAAAATGGTTTGCAATATTTCCTTCCTGTAATTGCATCATTGTATCTTGG 
AATAATTGCAGCACCTGTTAGTGATAAATACATTGAACGTGAATTAATACACAGTCTTGG 
TATTGTAAAACCACGCATAATTTTTTGCTCCAAGAATACTTTTCAAAAAGTACTGAATGT 
AAAATCTAAATTAAAATATGTAGAAACTATTATTATATTAGACTTAAATGAAGACTTAGG 
AGGTTATCAATGCCTCAACAACTTTATTTCTCAAAATTCCGATATTAATCTTGACGTAAA 
AT^TTTAAACCAAATTCTTTTAATCGAGACGATCAGGTTGCGTTGGTAATGTTTTCTTC 
TGGTACAACTGGTGTTCCGAAGGGAGTCATGCTAACTCACAAGAATATTGTTGCACGATT 
TTCTATTGCAAAAGATCCTACTTTTGGTAACGCAATTAATCCAACGACAGCAATTTTAAC 
GGTAATACCTTTCCACCATGGTTTTGGTATGATGACCACATTAGGATACTTTACTTGTGG 
ATTCCGAGTTGTTCTAATGCACACGTTTGAAGAAAAACTATTTCTACAATCATTACAAGA 
TTATAAAGTGGAAAGTACTTTACTTGTACCAACATTAATGGCATTTCTTGCAAAAAGTGC 
ATTAGTTGAAAAGTACGATTTATCGCACTTAAAAGAAATTGCATCTGGTGGCGCACCTTT 
ATCAAAAGAAATTGGGGAGATGGTGAAAAAACGGTTTAAATTAAACTTTGTCAGGCAAGG 
GTATGGATTAACAGAAACCACTTCGGCTGTTTTAATTACACCGAAAXX3CXXXGCCAGACC 
GGGATCAACTGGTAAAATAGTACCATTTCACGCTGTTAAAGTTGTCGATCCTACAACAGG 
AAAAATTTTGGGGCCAAATGAACCTGGAGAATTGTATTTTAAAGGCGCCATGATAATGAA 
GGGTTATTATAATAATGAAGAAGCTACTAAAGCAATTATTGATAAAGACGGATGGTTGCG 
CTCTGGTGATATTGCTTATTATGACAATGATGGCCATTTTTATATTGTGGACAGGCTGAA 
GTCATTAATTAAATATAAAGGTTATCAGGTTGCACCTGCTGAAATTGAGGGAATACTCTT 
ACAACATCCGTATATTGTTGATGCCGGCGTTACTGGTATACCGGATGAAGCCGCGGGCGA 
GCTTCCAGCTGCAGGTGTTGTAGTACAGACTGGAAAAXATCTAAACGAACAAATCGTACA 
AAATTTTGTTTCCAGTCAAGTTTCAACAGCCAAATGGCTACGTGGTGGGGTGAAATTTTT 
GGATGAAATTCCCAAAGGATCAACTGGAAAAATTGACAGAAAAGTGTTAAGACAAATGTT 
TGAAAAACACACCAATGGG^ 
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GGATCCAATGGCAGATAAAAATATTTTATATGGGCCCGAACCATTTTATCCCTTGGCTGA 
TGGGACGGCTGGAGAACAGATGTTTGACGCATTATCTCGTTATGCAGATATTCCCGGATG 
CATAGCATTGACAAATGCTCATACAT^GAAAATGTTTTATATGAAGAGTTTTTAAAATT 

gtcgtgtcgtttagcggaaagttttaaaaagtatggattaaaacaaaacg7«:acaatagc 
ggtgtgtagcgaaaatggtttgcaatttttccttcctgtaattgcatcattgtatcttgg 

AATAATTGCAGC AC CTGTTAGTGATAAATACGTT GAAC GTGAATTAATAC AC AGTCTTGG 
TATTGTAAAACCACGCATAATTTTTTGCTCCAAGAATACTTTTCAAAAAGTACTGAATGT 
AAAATCTAAATTAAAATATGTAGAAACTATTATTATATTAGACTTAT^TGAAGACTTAGG 
AGGTTATCAATGCCTCAAC7^CTTTATTTCTC7^AAATTCCGATAGTAATCTGGACGTA7\A 
AAAATTTAAACCAAATTCTTTT7VATCGAGACGATCAGGTTGCGTTGGTAATGTTTTCTTC 
TGGTACAACTGGTGTTCCG7VAGGGAGTCATGCTAACTCACAAGAATATTGTTGCACGATT 
TTCTCTTGCAAAAGATCCTACTTTTGGTAACGCAATTAATCCAACGACAGCAATTTTAAC 
GGTAATACCTTTCCACCATGGTTTTGGTATGATGACCACATTAGGATACTTTACTTGTGG 
ATTCCGAGTTGTTCTAATGCACACGTTTGAAGAAAT^TATTTCTACAATCATTACAAGA 
TTATAAAGTGGAAAGTACTTTACTTGTACCAACATTAATGGCATTTCTTGCAAAAAGTGC 
ATTAGTTGAAAAGTACGATTTATCGCACTTAAAAGAAATTGCATCTGGTGGCGCACCTTT 
ATCAAAAGAAATTGGGGAGATGGTGAAAAAACGGTTTAAATTAAACTTTGTCAGGCAAGG 
GTATGGATTAACAGAAACCACTTCGGCTGTTTTAATTACACCGAAAxxxxxxGCCAGACC 
GGGATCAACTGGTAAAATAGTACCATTTCACGCTGTTAAAGTTGTCGATCCTACAACAGG 
AAAAATTTTGGGGCCAAATGAACCTGGAGAATTGTATTTTAAAGGCGCCATGATAATGAA 

gggttattataataatgaagaagctact;\aagcaattattgataaagacggatggttgcg 
ctctggtgatattgcttattatgacaatgatggccatttttatattgtggacaggctgaa 
gtcattaattaaatataaaggttatcaggttgcacctgctgaaattgagggaatactctt 
acaacatccgtatattgttgatgccggcgttactggtataccggatgaagccgcgggcga 
gcttccagctgcaggtgttgtagtacagactgga/y^tatctaaacgaacaaatcgtaca 
a^iattttgtttccagtcaagtttcaacagccaaatggctacgtggtggggtgaaattttt 
ggatgaaattcccaaaggatcaactggaaaaattgacagaaaagtgttaagacaaatgtt 

TGAAAAACACACCAATGGG* 
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t 



GGATCCAATGGCAGATA/IlAAATATTTTATATGGGCCCGATVCCATTTTATCCCTTGGCTGA 
TGGGACGGCTGGAGAACAGATGTTTGACGCATTATCTCGTTATGCAGATATTCCGGGCTG 
CATAGCATTGACAAATGCTCATACAAAAGAAAATGTTTTATATGAAGAGTTTTTA7VAATT 
GTCGTGTCGTTTAGCGGAAAGTTTTAAAAAGTATGGATTAAAACAAAACGACACAATAGC 
GGTGTGTAGCGAAAATGGTTTGCAATTTTTCCTTCCTGTAATTGCATCATTGTATCTTGG 
AATAATTGTGGCACCTGTTAACGATAAATACATTGAACGTGAATTAATACACTVGTCTTGG 
TATTGTAAAACCACGCATAGTTTTTTGCTCCAAGAATACTTTTCAAA?lAGTACTGAATGT 

aaaatctaaattaaaatctgtagaaactattattatattagacttaaatgaagacttagg 
aggttatcaatgcctcaacaactttatttctcaaaattccgatattaatcttgacgtaaa 
aaaatttaaaccatattcttttaatcgagacgatcaggttgcgttgattatgttttcttc 
tggtacaactggtctgccgaagggagtcatgctaactcacaagaatattgttgcacgatt 

TTCTCTTGCAAAAGATCCTACTTTTGGTAACGCAATTAATCCCACGACAGCAATTTTAAC 

ggtaatacctttccaccatggttttggtatgatgaccacattaggatactttacttgtgg 
attccgagttgttctaatgcacacgtttgaagaaaaactatttctacaatcattacaaga 
ttataaagtggaaagtactttacttgtaccaacattaatggcatttcttgcaaaaagtgc 
attagttgaaaagtacgatttatcgcacttaaaagaaattgcatctggtggcgcaccttt 
atcaaaagaaattggggagatggtgaaaaaacggtttaaattaaactttgtcaggcaagg 

GTATGGATTAACAGAAACCACTTCGGCTGTTTTAATTACACCGAAAXXXXX3CGCCAGACC 

gggatcaactggtaaaatagtaccatttcacgctgttaaagttgtcgatcctacaacagg 
aaaaattttggggccaaatgaacctggagaattgtattttaaaggcccgatgataatgaa 
gggttattataataatgaagaagctactaaagcaattattgataatgacggatggttgcg 
ctctggtgatattgcttattatgacaatgatggccatttttatattgtggacaggctgaa 
gtcattaattaaatataaaggttatcaggttgcacctgctgaaattgagggaatactctt 
acaacatccgtatattgttgatgccggcgttactggtattccggatgaagccgcgggcga 
gcttccagctgcaggtgttgtagtacagactggaaaatatctaaacgaactwvtcgtaca 
agattttgtttccagtcaagtttcaacagccaaatggctacgtggtggggtgaaattttt 
ggatgaaattcccaaaggatct^ctggaaaaattgacagaaaagtgttaagacaaatgtt 
tgaaaaacacaccaatggg 
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4-) 

GGATCCAATGGCAGATAAGAATATTTTATATGGGCCCGAACCATTTTATCCCTTGGAAGA 
TGGGACGGCTGGAGAACAGATGTTTGACGCATTATCTCGTTATGCAGATATTCCGGGCTG 
CATAGCATTGACAAATGCTCATACAAAAGAAAATGTTTTATATGAAGAGTTTCTGAAACT 
GT CGTGTCGTTTAGC GGAAAGTTT TAAAAAGT ATGGAT TAAAACAAAACGACAC AATAGC 
GGTGTGTAGCGAAAATGGTCTGCAATTTTTCCTTCCTGTAATTGCATCATTGTATCTTGG 
AATAATTGTGGCAC CTGTTAACGATAAATACATT GAACGTGAAT TAATAC ACAGTCTTGG 
TATTGTAAAACCACGCATAATTTTTTGCTCCAAGAATACTTTTCAAAAAGTACTGAATGT 
AAAATCTAAATTAAAATCTGTAGA7VACTATTATTATATTAGACTTAAATGAAGACTTAGG 
AGGTTATCAATGCCTCAACAACTTTATTTCTCAAAATTCCGATATTAATCTTGACGTAAA 
AAAATTTAAACCATAT TCTTTTAATC GAGACGATCAGGTTGCGTT GTTAATGTTTTCTTC 
TGGTACAACTGGTCTGCCGAAGGGAGTCATGCTAACTCACAAGAATATTGTTGCACGATT 
TTCTCTTGCaAAAGATCCTACTTTTGGTAACGCAATTAATCCCACGACAGCAATTTTAAC 
GGTAATACCTTTCCACCATGGTTTTGGTATGATGACCACATTAGGATACTTTACTTGTGG 
ATTCCGAGTTGTTCTAATGCACACGTTTGAAGAAAAACTATTTCTACAATCATTACAAGA 
TTATAAAGTGGAAAGTACTTTACTTGTACCAACATTAATGGCATTTCTTGCAA7V7VAGTGC 
ATTAGTTGAAAAGTACGATTTATCGCACTTAAAAGAAATTGCATCTGGTGGCGCACCTTT 
ATCAAAAGAAATTGGGGAGATGGTGAAAAAACGGTTTAAATTAAACTTTGTCAGGCAAGG 
GTATGGATTAACAGAAACCACTTCGGCTGTTTTAATTACACCGAAAxxxxxxGCCAAACC 
GGGATCTACTGGTAAAATAGTACCATTTCACGCTGTTAAAGTTGTCGATCCTACAACAGG 
AAAAATTTTGGGGCCAAATGAACCTGGAGAATTGTATTTTAAAGGCCCGATGATAATGAA 
GGGTTATTATAATAATGAAGAAGCTACTAAAGCTU^TTATTGATAATGACGGATGGTTGCG 
CTCTGGTGATATTGCTTATTATGACAATGATGGCCATTTTTATATTGTGGACAGGCTGAA 
GTCACTGATTAAATATAAAGGTTATCAGGTTGCACCTGCTGAAATTGAGGGAATACTCTT 
ACAACATCCGTATATTGTTGATGCCGGCGTTACTGGTATTCCGGATGAAGCCGCGGGCGA 
GCTTCCAGCTGCAGGTGTTGTAGTACAGACTGGAAAATATCTAAACGAACAAATCGTACA 
AOATTATGTTGCCAGTCAAGTTTCAACAGCCAAATGGCTACGTGGTGGGGTGAAATTTTT 
GGATGAAATTCCCAAAGGATCAACTGGAAAAATTGACAGATWVGTGTTAAGACAAATGTT 
TGAAAAACACACCAATGGG 
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2) 

GGATCCAATGGAAGATAAAAATATTTTATATGGPiCCTGAACCATTTTATCCCTTGGCTGATGGGACGGCTGGAGAACAG 
ATGTTTTACGCATTATCTCGTTATGCAGATATTTCAGGATGCATAGCATTGACAAATGCTCATACAATVAGAAAATGTTT 
TATATGAAGAGTTTTTAAAATTGTCGTGTCGTTTAGCGGAAAGTTTTAAAAAGTATGGATTAAAACAAAACGACACAAT 
AGCGGTGTGTAGCGAAAATGGTTTGCAATTTTTCCTTCCTTTTVATTGCATCATTGTATCTTGGAATAATTGCAGCACCT 
GTTAGTGATAAATACATTGAACGTGAATTAATACACAGTCTTGGTATTGTAAAACCACGCATAATTTTTTGTTCCAAGA 
ATACTTTTCAAAAAGTACTGAATGTAAAATCTAAATTAAAATATGTAGAAACTATTATTATATTAGACTTAAATGT^GA 
CTTAGGAGGTTATCAATGCCTCAACAACTTTATTTCTCAAAATTCCGATATTAATCTTGACGTAAAAAAATTTAAACCA 
AATTCTTTTAATCGAGACGATCAGGTTGCGTTGGTAATGTTTTCTTCTGGTACAACTGGTGTTTCGAAGGGAGTCATGC 
TAACTCACAAGAATATTGTTGCACGATTTTCTCATTGCAAAGATCCTACTTTTGGTAACGCAATTAATCCAACGACAGC 
AATTTTAACGGTAATACCTTTCCACCATGGTTTTGGTATGATGACCACATTAGGATACTTTACTTGTGGATTCCGAGTT 
GCTCTAATGCACACGTTTGAAGAAAAACTATTTCTACAATCATTACAAGATTATAAAGTGGAAAGTACTTTACTTGTAC 
CAACATTAATGGCATTTTTTGCAAAAAGTGCATTAGTTGAAAAGTACGATTTATCGCACTTAAAAGAAATTGCATCTGG 
TGGCGCACCTTTATCAAAAGAAATTGGGGAGATGGTGAAAAAACGGTTTAAATTAAACTTTGTCAGGCAAGGGTATGGA 
TTAACAGAAACCACTTCGGCTGTTTTAATTACACCGGACACTGACGTCAGACCGGGATCAACTGGTAAAATAGTACCAT 
TTCACGCTGTTAAAGTTGTCGATCCTACAACAGGAAAAATTTTGGGGCCAAATGAAACTGGAGAATTGTATTTTAAAGG 
CGACATGATAATGAAAAGTTATTATAATAATGAAGAAGCTACTAAAGCAATTATTAACAAAGACGGATGGTTGCGCTCT 
GGTGATATTGCTTATTATGACAATGATGGCCATTTTTATATTGTGGACAGGCTGAAGTCATTAATTAAATATAAAGGTT 
ATCAGGTTGCACCTGCTGAAATTGAGGGAATACTCTTACAACATCCGTATATTGTTGATGCCGGCGTTACTGGTATACC 
GGATGAAGCCGCGGGCGAGCTTCCAGCTGCAGGTGTTGTAGTACAGACTGGAAAATATCTAAACGAACAAATCGTACAA 
.AATTTTGTTTCCAGTCAAGTTTCAACAGCCAAATGGCTACGTGGTGGGGTGAT^TTTTTGGATGAAATTCCCAAAGGAT 
CAACTGGAAAAATTGACAGAATU^GTGTTAAGACAT^TGTTTGAAAAACACATVATCTAAGCTG 
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GGATCCCATGATGAAGCGAGAGAAAAATGTTATATATGGACCCGAACCCCTACACCCCTT 
GGAAGACTTAACAGCTGGAGAAATGCTCTTCCGTGCCCTTCGAAAACATTCTCATTTACC 
GCAGGCTTTAGTAGATGTGGTTGGCGACGAATCGCTTTCCTATAAAGAGTTTTTTGAAGC 
GACAGTCCTCCTAGCGCAAAGTCTCCACAATTGTGGATACAAGATGAATGATGTAGTGTC 
GATCTGCGCCGAGAATAATACAAGATTTTTTATTCCCGTTATTGCAGCTTGGTATATTGG 
TATGATTGTAGCACCTGTTAATGAAAGTTACATCCCAGATGAACTCTGTAAGGTGATGGG 
TATATCGAAACCACAAATAGTTTTTACGACAAAGAACATTTTAAATAAGGTATTGGAGGT 
ACAGAGCAGAACTAATTTCATAAAAAGGATCATCATACTTGATACTGTAGAAAACATACA 
CGGTTGTGAAAGTCTTCCCAATTTTATTTCTCGTTATTCGGATGGAAATATTGCCAACTT 
CAAACCTTTACATTTCGATCCTGTTGAGCAAGTGGCAGCTATCTTATGTTCGTCAGGCAC 
TACTGGATTACCGAAAGGTGTAATGCAAACTCACCAAAATATTTGTGTCCGACTTATACA 
TGCTTTAGACCCCAGGGCAGGAACGCAACTTATTCCTGGTGTGACAGTCTTAGTATATCT 
GCCTTTTTTCCATGCTTTTGGGTTCTCTATAACCTTGGGATACTTCATGGTGGGTCTTCG 
TGTTATCATGTTCAGACGATTTGATCAAGAAGCATTTCTAAAAGCTATTCAGGATTATGA 
AGTTCGAAGTGTAATTAACGTTCCATCAGTAATATTGTTCTTATCGAAAAGTCCTTTGGT 
TGACAAATACGATTTATCAAGTTTAAGGGAATTGTGTTGCGGTGCGGCACCATTAGC7U\A 
AGAAGTTGCTGAGGTTGCAGCAAAACGATTAAACTTGCCAGGAATTCGCTGTGGATTTGG 
TTTGACAGAATCTAGTTCAGCTAATATACACAGTCTTAGGGATGAATTTAAATCAGGATC 
ACTTGGAAGAGTTACTCCTTTAATGGCAGCTAAAATAGCAGATAGGGAAACTGGTAAAGC 
ATTGGGACCAAATCAAGTTGGTGAATTATGCATTAAAGGTCCCATGGTATCGAAAGGTTA 
CGTGAACAATGTAGAAGCTACCAAAGAAGCTATTGATGATGATGGTTGGCTTCACTCTGG 
AGACTTTGGATACTATGATGAGGATGAGCATTTCTATGTGGTGGACCGTTACAAGGAATT 
GATTAAATATAAGGGCTCTCAGGTAGCACCTGCAGAACTAGAAGAGATTTTATTGAAAAA 
TCCATGTATCAGAGAT.GTTGCTGTGGTTGGTATTCCTGATCTAGAAGCTGGAGAACTGCC 
ATCTGCGTTTGTGGTTAAACAGCCCGGAAAGGAGATTACAGCTAAAGAAGTGTACGATTA 
TCTTGCCGAGAGGGTCTCCCATACAAAGTATTTGCGTGGAG'GGGTTCGATTCGTTGATAG 
CATACCAAGGAATGTTACAGGTAAAATTACAAGAAAGGAACTTCTGAAGCAGTTGCTGGA 
GAAGGCGGGAGGT 
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) GGATCCCATGATGAAGCGAGAGAAAAATGTTATATATGGACCCGAACCCCTACACCCCTT 
y GGAAGACTTAACAGCTGGAGAAATGCTCTTCCGTGCCCTTCGAAAACATTCTCATTTACC 
GCAGGCTTTAGTAGATGTGGTTGGCGACGAATCGCTTTCCTATAAAGAGTTTTTTGAAGC 
GACAGTCCTCCTAGCGCAAAGTCTCCACAATTGTGGATACAAGATGAATGATGTAGTGTC 
GATCTGCGCCGAGAATAATACAAGATTTTTTATTCCCGTTATTGCAGCTTGGTATATTGG 
TATGATTGTAGCACCTGTTAATGAPlAGTTACATCCCAGATGAACTCTGTAAGGTGATGGG 
TATATCGAAACCACAAATAGTTTTTACGACAAAGAACATTTTAAATAAGGTATTGGAGGT 
ACAGAGCAGAACTAATTTCATAAAAAGGATCATCATACTTGATACTGTAGAAAACATACA 

cggttgtgaaagtcttcccaattttatttctcgttattcggatggaaatattgccaactt 
caaacctttacatttcgatcctgttgagcaagtggcagctatcttatgttcgtcaggcac 
tactggattaccgaaaggtgtaatgcaaactcaccaaaatatttgtgtccgacttataca 
tgctttagaccccagggcaggaacgcaacttattcctggtgtgacagtcttagtatatct 
gccttttttccatgcttttgggttctctataaccttgggatacttcatggtgggtcttcg 
tgttatcatgttcagacgatttgatcaagaagcatttctaaaagctattcaggattatga 
agttcgaagtgtaattaacgttccatcagtaatattgttcttatcgaaaagtcctttggt 

TGACAAATACGATTTATCT^AGTTTAAGGGAATTGTGTTGCGGTGCGGCACCATTAGCAAA 
AGAAGTTGCTGAGGTTGCAGCAAAACGATTAAACTTGCCAGGAATTCGCTGTGGATTTGG 
TTTGACAGAATCTACTTCAGCTAATATACACAGTCTTAGGGATGAATTTAAATCAGGATC 
ACTTGGAAGAGTTACTCCTTTAATGGCAGCTAAAATAGCAGATAGGGAAACTGGTAAAGC 
ATTGGGACCAAATCAAGTTGGTGAATTATGCATTAAAGGTCCCATGGTATCGAAAGGTTA 
CGTGAACAATGTAGAAGCTACCAAAGAAGCTATTGATGATGATGGTTGGCTTCACTCTGG 
AGACTTTGGATACTATGATGAGGATGAGCATTTCTATGTGGTGGACCGTTACAAGGAATT 
GATTAAATAT/^GGGCTCTCAGGTAGCACCTGCAGAACTAGAAGAGATTTTATTGAAAAA 
TCCATGTATCAGAGATGTTGCTGTGGTTGGTATTCCTGATCTAGAAGCTGGAGAACTGCC 
ATCTGCGTTTGTGGTTAAACAGCCCGGAAAGGAGATTACAGCTAAAGAAGTGTACGATTA 
TCTTGCCGAGAGGGTCTCCCATACAAAGTATTTGCGTGGAGGGGTTCGATTCGTTGATAG 
CATACCAAGGAATGTTACAGGTAAAATTACAAGAAAGGAACTTCTGAAGCAGTTGCTGGA 
GAAGGCGGGAGGT 
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Wood and Hall 

17. A DNA molecule having a nucleotide sequence that encodes a 
luciferase of claim 1 or 7. 

18. The use of luciferases of claims 1 or 7 in ATP assays; as 
luminescent labels for nucleic acids, proteins, or other macromolecules; as genetic 
reporters; in enzyme immobilization; as hybrid proteins; in high temperature 
reactors; and in luminescent solution. 

19. A kit comprising a beetle luciferase with a half-life of at least 2 
hours at 50°C. 

20. The kit of claim 19 used for ATP assays; as luminescent labels for 
nucleic acids, proteins, or other macromolecules; as genetic reporters; in enzyme 
immobilization; as hybrid proteins; in high temperature reactors; and in 
luminescent solution. 

21. A luciferase having an amino acid sequence consisting of 
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DPMEDKNILYGPEPFYPLADGTAGEQMFYALSRYADISGCIALTNAHTKENVLYEEFLKL 

SCRLAESFKKYGLKQNDTIAVCSENGLQFFLPIIASLYLGIIAAPVSDKYIERELIHSLG 

IVKPRIIFCSKNTFQKVLWKSKLKYVETIIILDLNEDLGGYQCLNNFISQNSDINLDVK 

KFKPYSFNRDDQVALVMFSSGTTGVSKGVMLTHKNIVARFSLAKDPTFGNAINPTTAILT 

VIPFHHGFGMOTTLGYFTCGFRVVIJ^FEEKLFI^SLQDYKVESTLLVPTLMAFIAKSA 

LVEKYDLSHLKEIASGGAPLSKEIGEMVKKRFKLNFVRQGYGLTETTSAVLITPNNDVRP 

GSTGKIVPFHAVKWDPTTGKILGPNETGELYFKGDMIMKGYYNNEEATKAIINKDGWLR 

SGDIAYYDNDGHFYIVDRLKSLIKYKGYQVAPAEIEGILLQHPYIVDAGVTGIPDEAAGE 

LPAAGVWOTGKYLNEQIVQNFVSSQVSTAKWLRGGVKFLDEIPKGSTGKIDRKVLRQMF 
EKHTNG 
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DPMEDKNILYGPEPFYPLADGTAGEQMFYALSRYADISGCI7a.TNAHTKEWLYEELLKL 

SCRLAESFKKYGLKQNDTIAVCSENGLQFFLPIIASLYLGIIAAPVSDKYIERELIHSLG 

IVKPRIIFCSKNTFQKVLWKSKLKYVETIIILDLNEDLGGYQCLNNFISQNSDINLDVK 

KFKP YS FNRDDQVALVMFS S GTTGVS KGVMLTHKNI VARFSHAKDPT FGNAI NPTTAI LT 

VIPFHHGFCTlMTTLGYFTCGFRVVIJ^FEEKLFLQSLQDYKVESTLLVPTimFF^^ 

LVEKYDLSHLKEIASGGAPLSKEIGEMVKKRFKLNFVRQGYGLTETTSAVXITPNNDVRP 

GSTGKIVPFHAVKVA^PTTGKILGPNETGELYFKGDMIMKGYYNNEEATKAIINKDGWLR 

SGDIAYYDNDGHFYIVDRLKSLIKYKGYQVAPAEIEGILLQHPYIVDAGVTGIPDEAAGE 

L PAAGVWQTGKYLNE QI VQNFVS SQVS TAKWLRGGVKFLDE I PKGS TGKI DRKVLRQMF 

EKHTNG 
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DPMEDKNILyGPEPFYPLMGTAGEQMFYALSRYADISGCIALTNAHTKE^A^YEEFLKL 

SCRLAESFKKYGLKQNDTIAVCSENGLQFFLPIIASLYLGIIAAPVSDKYIERELIHSLG 

IVKPRIIFCSKNTFQKVLNVKSKLKYVETIIILDLNEDLGGYQCLNNFISQNSDINLDVK 

KFKPYSFNRDDQVALVMFSSGTTGVSKGVMLTHKNIVTOFSLAKDPTFGNAINPTTAILT 

VIPFHHGFGMMTTLGYFTCGFRVVLMHTFEEKLFLQSLQDYKVESTLLVPTIJ4A 

LVEKYDLSHLKEIASGGAPLSKEIGEMVT<KRFKLNFVRQGYGLTETTSAVLITPNNDVRP 

GSTGKIVPFHAVKWDPTTGKILGPNETGELYFKGDMIMKGYYNNEEATKAIITKDGWLR 

SGDIAYYDNDGHFYIVDRLKSLIKYKGYQVAPAEIEGILLQHPYIVDAGVTGIPDEAAGE 

LPAAG\AA^QTGKYLNEQIVQNFVSSQVSTAKWLRGGVKFLDEIPKGSTGKIDRKVLRQMF 

EKHTNG 
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J- ) 

DPMEDKNILYGPEPFYPLADGTAGEQMFYALSRYADISGCIALTNAHTKENVLYEEFLKL 

SCRLAESFKKYGLKQNDTIAVCSENGLQFFLPIIASLYLGIIAAPVSDKYIERELIHSLG 

IVKPRIIFCSKNTFQKVL^AAKSKLKYVETIIILDLNEDLGGYQCLNKFISQNSDINLDVK 

KFKPYS FNRDDQVALVMFSSGTTGVS KGVMLTHKNI VARFS I AKDPTFGNAI NPTTAI LT 

VIPFHHGFCa^MTTLGYFTCGFRVVIJWrFEEKLFLQSLQDYKVESTLLVPTIJ^IJ^ 

LVEKYDLSHLKEIASGGAPLSKEIGEMVKKRFKLNFVRQGYGLTETTSAVLITPNNDVRP 

GSTGKIVPFHAVKWDPTTGKILGPNETGELYFKGDMIMKGYYNNEEATKAIINKDGWLR 

SGDIAYYDNDGHFYIVDRLKSLIKYKGYQVAPAEIEGILLQHPYIVDAGVTGIPDEAAGE 

LPAAGVWQTGKYLNEQIVQNFVSSQVSTAKWLRGGVKFLDEIPKGSTGKIDRKVLRQMF 
EKHTITG 
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DPMEDKNILYGPEPFYPLADGTAGEQMFDALSRYADISGCIALTNAHTKENVLYEEFLKL 

SCRLAESFKKYGLKQNDTIAVCSENGLQFFLPIIASLYLGIIT^PVSDKYIERELIHSLG 

IVKPRIIFCSKNTFQKVLNVKSKLKYVETIIILDLNEDLGGYQCLNNFISQNSDINLDVK 

KFKPYSFNRDDQVALVMFSSGTTGVSKGVMLTHKNIV7VRFSHAKDPTFGNAINPTTAILT 

VIPFHHGFGMMTTLGYFTCGFRVVMHTFEEKLFLQSLQDYKVESTLLVPTLMAFFAKSA 

LVEKYDLSHLKEIASGGAPLSKEIGEMVKKRFKLNFVRQGYGLTETTSAVLITPNNDVRP 

GSTGKIVPFHAVKVVDPTTGKILGPNETGELYFKGDMIMKGYYNNEEATKAIINKDGWLR 

SGDIAYYDNDGHFYIVDRLKSLIKYKGYQVAPAEIEGILLQHPYIVDAGVTGIPDEAAGE 

LPAAGWVQTGKYLNEQIVQNFVSSQVSTAKWLRGGVKFLDEIPKGSTGKIDRKVLRQMF 
EKHTNG 
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DPMADKNILYGPEPFYPLADGTAGEQMFDALSRYADISGCIALTNAHTKENVLYEEFLKL 

SCRIAESFKKYGLKQNDTIAVCSENGLQFFLPVIASLYLGIIAAPVSDKYIERELIHSLG 

IVKPRIIFCSKNTFQKVLNVKSKLKSVETIIILDLNEDLGGYQCLNNFISQNSDINLDVK 

KFKPYSFNRDDQVALVMFSSGTTGVSKGVMLTHKNIVARFSLAKDPTFGNAINPTTAILT 

VIPFHHGFGIWTLGYFTCGFRVVLMHTFEEKLFLQSLQDYKVESTLLVPTLMAFLAKSA 

LVEKYDLSHLKEIASGGAPLSKEIGEMVKKRFKLNFVRQGYGLTETTSAVLITPKxxARPG 

STGKIVPFHAVKWDPTTGKILGPNEPGELYFKGAMIMKGYYNNEEATKAIIDNDGWLRS 

GD I AY YDNDGHF YI VDRLKS LI KYKG YQVAPAE I EG I LLQHP YI VDAGVTGI PDEAAGEL 

PAAGVWQTGKYLNEQIVQDFVSSQVSTAKWLRGGVKFLDEIPKGSTGKIDRKVXRQMFE 
KHTMG$ 
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DPMADKNILYGPEPFYPLADGTAGEQHFYALSRYADISGCIALTNAHTKENVLYEEFLKL 

SCRLAESFKKYGLKQNDTIAVCSENGLQFFLPVIASLYLGIIAAPVSDKYIERELIHSLG 

IVKPRIIFCSKNTFQKVLNVKSKLKYVETIIILDLNEDLGGYQCLNNFISQNSDINLDVK 

KFKPYSFNRDDQVALVMFSSGTTGVPKGVMLTHKNIVARFSIAKDPTFGNAINPTTAILT 

VIPFHHGFGMMTTLGYFTCGFRV^^^MHTFEEKLFLQSLQDYKVESTLLVPTLMAFIAKSA 

LVEKYDLSHLKEIASGGAPLSKEIGEMVKKRFKLNFVRQGYGLTETTSAVLITPKmVRPG 

STGKIVPFHAVKWDPTTGKILGPNEPGELYFKGDMIMKGYYNNEEATKAIIDKDGWLRS 

gdiayydndghfyivdrlkslikykgyqvapaeiegillqhpyivdagvtgipdeaagel 

PAAGVVVQTGKYLNEQIVQNFVSSQVSTAKWLRGGVKFLDEIPKGSTGKIDRKVXRQMFE 
KHTNG 
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DPMADKNI LYGPEP FYPLADGTAGEQMFDALS RYAD IPGC I ALTNMTKENVLYEEFLKL 

SCRLAESFKKYGLKQNDTIAVCSENGLQYFLPVIASLYLGIIAAPVSDKYIERELIHSLG 

IVKPRIIFCSKNTFQKVLNVKSKLKYVETIIILDLNEDLGGYQCLNNFISQNSDINLDVK 

KFKPNSFNRDDQVALVMFSSGTTGVPKGVMLTHKNIVARFSIAKDPTFGNAINPTTAILT 

VIPFHHGFGMMTTLGYFTCGFRVVLMHTFEEKLFLQSLQDYKVESTLLVPTLMAFIAKSA 

LVEKYDLSHLKEIASGGAPLSKEIGEMVKKRFKL^fFVRQGYGLTETTSAVLITPK^cxARPG 

STGKIVPFHAVKVA/TDPTTGKILGPNEPGELYFKGAMIMKGYYNNEEATKAIIDKDGWLRS 

GDIAYYDNDGHFYIVBRLKSLIKYKGYQVAPAEIEGILLQHPYIVDAGVTGIPDEAAGEL 

PAAGVWQTGKYLNEQIVQNFVSSQVSTAKWLRGGVKFLDEIPKGSTGKIDRKVLRQMFE 
KHTNG 
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^1 



DPMADKNILYGPEPFYPIADGTAGEQMFDALSRYADIPGCIALTNAHTKENVLYEEFLKL 

SCRIAESFKKYGLKQNDTIAVCSENGLQFFLPVIASLYLGIIAAPVSDKYVERELIHSLG 

IVKPRIIFCSKNTFQKVLNVKSKLKYVETIIILDLNEDLGGYQCLNNFISQNSDSNLDVK 

KFKPNSFNRDDQVALVMFSSGTTGVPKGVMLTHKNIVARFSLAKDPTFGNAINPTTAILT 

VIPFHHGFQlMTTLGYFTCGFRVVIJ^TFEEKLFLQSLQDYKVESTLLVPTLMAFIi^ 

LVEKYDLSHLKEIASGGAPLSKEIGEMVKKRFKLNFVRQGYGLTETTSAVLITPKxxARPG 

STGKIVPFHAVKVVDPTTGKILGPNEPGELYFKGAMIMKGYYNNEEATKAIIDKDGWLRS 

GDIAYYDNDGHFYIVDRLKSLIKYKGYQVAPAEIEGILLQHPYIVDAGVTGIPDEAAGEL 

PAAGVWQTGKYLNEQIVQNFVSSQVSTAKWLRGGVKFLDEIPKGSTGKIDRKVLRQMFE 
KHTNG 
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DPMADKNI LYGPEP FYPLADGTAGEQMFDALS RYAD IPGC I ALTNAHTKENVLYEE FLKL 

SCRLAESFKKYGLKQNDTIAVCSENGLQFFLPVIASLYLGIIVAPVNDKYIERELIHSLG 

IVKPRIVFCSKNTFQKVLNVKSKLKSVETIIILDLNEDLGGYQCLNNFISQNSDINLDVK 

KFKPYSFNRDDQV7U.IMFSSGTTGIiPKGVmTHKNIVAJlFSIAraPTFGNAINPTTAILT 

VIPFHHGFCTIMTTLGYFTCGFRVVMHTFEEKLFLQSLQDYKVESTLLVPTimFIiAKSA 

LVEKYDLSHLKEIASGGAPLSKEIGEMVKKRFIOiNFVRQGYGLTETTSAVLITPKxxARPG 

STGKIVPFHAVKWDPTTGKILGPNEPGELYFKGPMIMKGYYNNEEATKAIIDNDGWLRS 

GDIAYYDNDGHFYIVDRLKSLIKYKGYQVAPAEIEGILLQHPYIVDAGVTGIPDEAAGEL 

PAAGWVQTGKYLNEQIVQDFVSSQVSTAKWLRGGVKFLDEIPKGSTGKIDRKVLRQMFE 
KHTNG 
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DPMADKNILYGPEPFYPLEDGTAGEQMFDALSRYADIPGCIALTNAHTKENVXYEEFLKL 

SCRLAESFKKYGLKQNDTIAVCSENGLQFFLPVIASLYLGIIVAPVNDKYIERELIHSLG 

IVKPRIIFCSKNTFQKVLNVKSKLKSVETIIILDLNEDLGGYQCLNNFISQNSDINLDVK 

KFKP YS FNRDDQVALLMFS S GTTGLPKGVMLTHKNI V7VRFSLAKDPT FGNAI NPTTAI LT 

VIPFHHGFGMMTTLGYFTCGFRVVTiMHTFEEKLFLQSLQDYKVESTLLVPTIJ^FL^ 

LVEKYDLSHLKEIASGGAPLSKEIGEMVKKRFKLNFVRQGYGLTETTSAVLITPKxxAKPG 

STGKIVPFKWKWDPTTGKILGPNEPGELYFKGPMIMKGYYNNEEATKAIIDNDGWLRS 

GDIAYYDNDGHFYIVDRLKSLIKYKGYQVAPAEIEGILLQHPYIVDAGVTGIPDEAAGEL 

_AAGVWQTGKYLNEQIVQDYVASQVSTAKWLRGGVKFLDEIPKGSTGKIDRKVLRQMFE 

KHTNG 
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DPMEDKNILYGPEPFYPLADGTAGEQMFYALSRYADISGCIALTNAHTKENVLYEEFLKL 
SCRLAESFKKYGLKQNDTIAVCSENGLQFFLPLIASLYLGIIAAPVSDKYIERELIHSLG 
IVKPRIIFCSKNTFQKVLNVKSKLKYVETIIILDLNEDLGGYQCLNNFISQNSDINLDVK 

kfkpnsfnrddqvalvmfssgttgvskgvmlthknivtuifshckdptfgnainpttailt 

vipfhhgfqimxtlgyftcgfrvaij^feeklflqslqdykvestllvptij^aff;^ 

lvekydlshlkeiasggaplskeigemvkkrfklnfvrqgygltettsavlitpdtdvrp 

gstgkivpfhavkwdpttgkilgpnetgelyfkgdmimksyynneeatkaiinkdgwlr 

sgdiayydndghfyivdrlkslikykgyqvapaeiegillqhpyivdagvtgipdeaage 

lpaagvwqtgkylneqivqnfvssqvstakwlrggvkfldeipkgstgkidrkvlrqmf 

EKHKSKL 
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DPMMKREKNVIYGPEPLHPLEDLTAGEMLFRALRKHSHLPQALVDWGDESLSYKEFFEA 

TVLLAQSLHNCGYKMNDWSICAENNTRFFIPVIAAWYIGMIVAPVNESYIPDELCKVMG 

ISKPQIVFTTKNILNKVLEVQSRTNFIKRIIILDTVENIHGCESLPNFISRYSDGNIANF 

KPLHFDPVEQVAAILCSSGTTGLPKGVMQTHQNICVRLIHALDPRAGTQLIPGVTVLVYL 

P FFHAFGFS I TLGYFMVGLRVIMTRRFDQEAFLKAI QD YEVRSVINVP SVI LFLS KS PLV 

DKYDLSSLRELCCGAAPLAKEVAEVAAKRLNLPGIRCGFGLTESTSANIHSLRDEFKSGS 

LGRVTPIJyiAAKIADRETGKALGPNQVGELCIKGPMVSKGYVNNVEATKEAIDDDGWLHSG 

DFGYYDEDEHFYWDRYKELIKYKGSQVAPAELEEILLKNPCIRDVAWGIPDLEAGELP 

SAFWKQPGKEITAKEVYDYLAERVSHTKYLRGGVRFVDSIPRNVTGKITRKELLKQLLE 
KAGG 
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) ^fiT^^f'^^^^^P^HPLEDLTAGEMLFRALRKHSHLPQALVDWGDESLSYKEFFEA 

DKYDLSSLRELCCGAAPLAKEVAEVAAKRLNLPGIRCGFGLTESTSANIHSLRDEFKSGS 

DFGYYDEDEHFYWDRYKELIKYKGSQVAPAELEEILLKNPCIRDVAWGIPDLEAGELP 
SAFWKQPGKEITAKEVYDYLAERVSHTKYLRGGVRFVDSIPRNVTGKITRKELLKQLLE 
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22. The luciferase of clainri 21 further characterized as having a half-life 
of 2 hours at 50°C. 
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22. The luciferase of claim 21 further characterized as having a half-life 
of 2 hours at 50°C. 
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Remaining Activity 
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Kemaining Activity 
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Normalized light Units 
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Log Luminescence 
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Log 

Luminescence 
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Log luminescence 
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22C 78-OQ10 Projected 1/Z-(lfe=109 days 
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22C49-7C6 Projoctad 1/2-1ffe::64 days 
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37C49-7C6 C*tculate<i 1/Wifee111 hour* 
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22C PPE-2(w,t.) Calculated 1/2-11(6=27.7 
days 
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37C PPE.2(w.t) 1/2-life=-4 min. 
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IN VIVO 
LUMINESCENCE: 



20 uf Buffer A 



Ccovoro<0 



Store In 
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(covered) 
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IS ill Lysote 
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TIME - 0 Krs 
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ASSAY 

KINETICS (S): 
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FIGURE 16 
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FIGURE 18A 
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FIGURE 18B 



wo 99/14336 PCT/US98/19494 

20/59 



FIGURE laC 



wo 99/14336 



21 / 59 



PCr/US98/19494 



FIGURE 19 



1 50 

Lcr MENMENDE.N IWGPKPFYP lEEGSAGTQL RKYMERYAKL .GAIAFTNAV 

Lla MENMENDE.N IVYGPEPFYP lEEGSAGAQL RKYMDRYAKL .GAIAFTNAL 

Lmi ME MEKEE.N WYGPLPFYP lEEGSAGIQL HKYMHQYAKL .GAIAFSNAL 

Pmi ...MEDDSKH IMHGHRHSIL WEDGTAGEQL HKAMKRYAQV PGTIAFTDAH 

Ppy MED.AKN IKKGPAPFYP LEDGTAGEQL HKAMKRYALV PGTIAFTDAH 

Lno .MED.AKN IMHGPAPFYP LEDGTAGEQL HKAMKRYAQV PGTIAFTDAH 

Ppel MSI.ENN ILIGPPPYYP LEEGTAGEQL HRAISRYAAV PGTLAYTDVH 

Phg MIKME..EEH VMPGAMPRDL LFEGTAGQQL HRALYKHSYF PE. .AIVDSH 

GR ...MMKREKN WYGPEPLHP LEDLTAGEML FRALRKHSHL PQ. .ALVDVY 

YG MMKREKN VIYGPEPLHP LEDLTAGEML FRALRKHSHL PQ. .ALVDVF 

Ppe2 '"med..kn ILYGPEPFYP LADGTAGEQM FYALSRYADI SGCIALTNAH 



4 9-7c6 
78-OblO A 
90-lb5 A 



D P 
D P 



cons ---M G AG A---- 

Lcr TGVDYSYAEY LEKSCCLGKA LQNYGLWDG RIALCSENCE EFFIPVIAGL 
Lla TGVDYTYAEY LEKSCCLGEA LKNYGLWDG RIALCSENCE EFFIPVLAGL 
Lmi TGVDISYQEY FDITCRLAEA MKNFGMKPEE HIALCSENCE EFFIPVLAGL 
Pmi AEVNITYSEY FEMSCRLAET MKRYGLGLQH HIAVCSETSL QFFMPVCGAL 
Ppy lEVNITYAEY FEMSVRLAEA MKRYGLNTNH RIWCSENSL QFFMPVLGAL 
Lno AEVNITYSEY FEMACRLAET MKRYGLGLQH HIAVCSENSL QFFMPVCGAL 
Ppel TELEVTYKEF LDVTCRLAEA MKNYGLGLQH TISVCSENCV QFFMPICAAL 
Phg THEIISYAKI LDMSCRLAVS FQKYGLTQNN IIGICSENNL NFFNPVIAAF 
GR GEEWISYKEF FETTCLLAQS LHNCGYKMSD VVSICAENNK RFFVPIIAAW 
YG GDESLSYKEF FEATCLLAQS LHNCGYKMND VVSICAENNK RFFIPIIAAW 
Ppe2 TKENVLYEEF LKLSCRLAES FKKYGLKQND TIAVCSENGL QFFLPLIASL 
4 9-7C6 I 
78-OblO ^ 
90-lb5 ^ 

cons Y L G C-E ^^-^' — l 

101 

Lcr FIGVGVAPTN EIYTLRELVH SLGISKPTIV FSSKKGLDKV ITVQKTVTTI 
Lla FIGVGVAPTN EIYTLRELVH SLGISKPTIV FSSKKGLDKV ITVQKTVATI 
Ltni YIGVAVAPTN EIYTLRELNH SLGIAQPTIV FSSRKGLPKV LEVQKTVTCI 
Pmi FIGVGVAPTN DIYNERELYN SLFISQPTIV FCSKRALQKI LGVQKKLPVI 
Ppy FIGVAVAPAN DIYNERELLN SMNISQPTVV FVSKKGLQKI LNVQKKLPII 
Lno FIGVGVASTN DIYNERELYN SLSISQPTIV SCSKRALQKI LGVQKKLPII 
Ppel YVGVATAPTN DIYNERELYN SLSISQPTVV FTSRNSLQKI LGVQSRLPII 
Phg YLGITVATVN DTYTDRELSE TLNITKPQML FCSKQSLPIV MKTMKIMPYV 
GR YIGMIVA^VN EGYIPDELCK VMGISRPQLV FCTKNILNKV LEVQSRTDFI 
YG YIGMIVAPVN ESYIPDELCK VMGISKPQIV FCTKNILNKV LEVQSRTNFI 
Ppe2 YLGIXAAPV5 DKYIERELIH SLGIVKPRII FCSKNTFQKV LNVKSKLKYV 
')9-7C6 

78-OblO I 
90-1B5 V N V Q SI 

Cons --C; A --V EL-- I--P 
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151 



Lcr 

Lla 

Lmi 

Pmi 

Ppy 

Lno 
Ppel 

Phg 

GR 

YG 
Ppe2 
49-7C6 
78-OblO 
90-lb5 
Cons 



KTIVILDSKV 
KTIVILDSKV 
KKIVILDSKV 
QKIVILDSRE 
QKIIIMDSKT 
QKIVILDSRE 
KKIIILDGKK 
QKLLIIDSMQ 
KRIIILDAVE 
KRIIILDTVE 
ETIIILDLNE 



DYRGYQCLDT 
DYRGYQSMDN 
NFGGHDCMET 
DYMGKQSMYS 
DYQGFQSMYT 
DYMGKQSMYS 
DYLGYQSMQS 
DIGGIECVHS 
NIHGCESLPN 
NIHGCESLPN 
DLGGYQCLNN 



FIKRNTPPGF 
FIKKNTPQGF 
FIKKHVELGF 
FIESHLPAGF 
FVTSHLPPGF 
FIESHLPAGF 
FMKEHVPANF 
FVSRYTDEHF 
FISRYSDGNI 
FISRYSDGNI 
FISQNSDINL 



QASSFKTVEV 
KGSSFKTVEV 
QPSSFVPIDV 
NEYDYIPDSF 
NEYDFVPESF 
NEYDYIPDSF 
NVSAFKPLSF 
DPLKFVPLDF 
A. .NFKPLHY 
A. .NFKPLHY 
DVKKFKPNSF 
Y 
Y 
Y 



200 

. DRKEQVALI 
.NRKEQVALI 
KNRKQHVALL 
. DRETATALI 
.DRDKTIALI 
. DRETATALI 
.DLDR.VACI 
. DPREQVALI 
. DPVEQVAAI 
. DPVEQVAAI 
.NRDDQVALV 



Lcr 

Lla 

Lmi 

Pmi 

Ppy 

Lno 
Ppel 

Phg 

GR 

YG 
Ppe2 
49-7c6 
78-OblO 
90-lb5 

Cons 

Lcr 
Lla 
Lmi 
Pmi 
Ppy 
Lno 
Ppel 
Phg 
GR 
YG 
Ppe2 
49-7c6 



I-D 

201 

MNSSGSTGLP 
MNSSGSTGLP 
MNSSGSTGLP 
MNSSGSTGLP 
MNSSGSTGLP 
MNSSGSTGLP 
MNSSGSTGLP 
MTSSGTTGLP 
LCSSGTTGLP 
LCSSGTTGLP 
MFSSGTTGVS 



KGVQLTHENT 
KGVQLTHENA 
KGVRITHEGA 
KGVDLTHMNV 
KGVALPHRTA 
KGVELTHQNV 
KGVPISHRNT 
KGVMLTHRNI 
KGVMQTHRNV 
KGVMQTHQNI 
KGVMLTHKNI 



LP 

— SSG-TG — 
251 

GFGMFTTLGY 
GFGMFTTLGY 
GFGMFTTLGY 
VFQMFTTLGY 
GFGMFTTLGY 
GFGMFTTLGY 
AFGTFTNLGY 
AFGMFTTLSY 
AFGFSINLGY 
AFGFSINLGY 
GFGMTTTLGY 
M 



VTRFSHARDP 
VTRFSHARDP 
VTRFSHAKDP 
CVRFSHCRDP 
CVRFSHARDP 
CVRFSHCRDP 
lYRFSHCRDP 
CVRFVHSRDP 
CVRLIHALDP 
CVRLIHALDP 
VARFSHCKDP 

LA 

LA 

LA 



lYGNQVSPGT 
lYGNQVSPGT 
lYGNQVSPGT 
VFGNQIIPDT 
IFGNQIIPDT 
VFGNQIIPDT 
VFGNQIIPDT 
LFGTRFIPET 
RVGTQLIPGV 
RAGTQLIPGV 
TFGNAINPTT 



L(I) 

A— 
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AVLTWPFHH 
AILTWPFHH 
AILTWPFHH 
AILTVIPFHH 
AILSWPFHH 
AILTVIPFHH 
TILCAVPFHH 
SILSLVPFHH 
TVLVYLPFFH 
TVLVYLPFFH 
AILTVIPFHH 



78-OblO M 
90-lb5 M 
Cons -F 



KGV H 

LICGFRVVML 
LTCGFRIVML 
FACGYRVVML 
LTCGFRIVLM 
LICGFRWLM 
LTCGFRIVLM 
LICGFHVVLM 
FIVGLKIVMM 
mVGLRVIML 
FMVGLRVIML 
FTCGFRVALM 
V 
V 
V 



TKFDEETFLK 
TKFDEETFLK 
TKFDEELFLR 
YRFEEELFLR 
YRFEEELFLR 
YRFEEELFLR 
YRFNEHLFLQ 
KRFDGELFLK 
RRFDQEAFLK 
RRFDOEAFLK 
HTFEEKLFLQ 



TLQDYKCTSV 
TLQDYKCSSV 
TLQDYKCTSV 
SLQDYKIQSA 
SLQDYKIQSA 
SLQDYKIQSA 
TLQDYKCQSA 
TIQNYKIPTI 
AIQDYEVRSV 
AIQDYEVRSV 
SLQDYKVEST 



— L PF-H 
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ILVPTLFAIL 
ILVPTLFAIL 
ILVPTLFAIL 
LLVPTLFSFF 
LLVPTLFSFF 
LLVPTLFSFF 
LLVPTVLAFL 
VIAPPVMVFL 
INVPAIILFL 
INVPAIILFL 
LLVPTLMAFF 
L 
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301 


Lcr 


NKSELLNKYD 


Lla 


NRSELLDKYD 


Lmi 


NKSELIDKFD 


Pmi 


AKSTLVDKYD 


Ppy 


AKSTLIDKYD 


Lno 


AKSTLVDKYD 


Ppel 


AKNPLVDKYD 


Phg 


AKSHLVDKYD 


GR 


SKSPLVDKYD 


YG 


SKSPLVDKYD 


Ppe2 


AKSALVEKYD 


4 9-7C6 




78-OblO 


90-lb5 




Cons 


L~K-D 



350 



LS E G -APL— 



R— L G-GLTE- 



400 

Lcr tILiitpeg DDKPGASGKV vplfkakvid ldtkkslgpn ^^^EVCVK^ 

Lla TSAIIITPEG DDKPGASGKV VPLFKAKVID LDTKKTLGPN RRGEVCVKGP 

til ?safH?peg Sdkpgasgkv vplfkvkvid ldtkktlgvn l^l^^^l 

Pn.i TSAIIITPEG DDKPGACGKV VPFFTAKIVD LDTGKTLGVN Q^^ELCVKG^ 
ppy TSAILITPEG DDKPGAVGKV VPFFEAKWD LDTGKTLGVN ^GELCVRGP 
Lno TSAIIITPEG DDKPGACGKV VPFFSAKIVD LDTGKTLGVN QRGELCVKGP 
Ppel TCAIVITAEG EFKLGAVGKV VPFYSLKVLD LNTGKKLGPN ERGEICFKGP 
Phi CCAVLITPHN KIKTGSTGQV LPYVTAKIVD TKTGKNLGPN QTGELCFKSD 
GR^ TSANIHSLRD EFKSGSLGRV TPLMAAKIAD RETGKALGPN QVGELCIKGP 
YG TSANIHSLGD EFKSGSLGRV TPLMAAKIAD RETGKALGPN QVGELCVKGP 
Ppe2 TSAVLITPDT DVRPGSTGKI VPFHAVKWD PTTGKILGPN ETGELYFKGD 
49-7C6 NN PA 

78-OblO XX A P P 

90-1B5 XX A 



Cons --A G — G- 



_K— D — T-K-LG-N — GE 



401 

Lcr MLMKGYVNNP EATKELIDEE 
Lla MLMKGYVDNP EATREIIDEE 
Lmi SLMLGYSNNP EATRETIDEE 
Pmi MIMKGYVNNP EATNALIDKD 
Ppy MIMSGYVNNP EATNALIDKD 
Lno MIMKGYVNNP EATSALIDKD 
Ppel MIMKGYINNP EATRELIDEE 
Phg IIMKGYYQNE EETRLVIDKD 
GR MVSKGYVNNV EATKEAIDDD 
YG MVSKGYVNNV EATKEAIDDD 
Ppo2 MIMKSYYNNE EATKAIINKD 
49-"'C6 G 

7R-0bl0 r. DN 



450 

GWLHTGDIGY YDEEKHFFIV DRLKSLIKYK 
GWLHTGDIGY YDEEKHFFIV DRLKSLIKYK 
GWLHTGDIGY YDEDEHFFIV DRLKSLIKYK 
GWLHSGDIAY YDKDGHFFIV DRLKSLIKYK 
GWLHSGDIAY WDEDEHFFIV DRLKSLIKYK 
GWLHSGDIAY YDKDGHFFIV DRLKSLIKYK 
GWIHSGDIGY FDEDGHVYIV DRLKSLIKYK 
GWLHSGDIGY YDTDGNFHIV DRLKELIKYK 
GWLHSGDFGY YDEDEHFYVV DRYKELIKYK 
GWLHSGDFGY YDEDEHFYVV DRYKELIKYK 
GWLRSGDIAY YDNDGHFYIV DRLKSLIKYK 
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451 

Lcr GYQVPPAELE SVLLQHPSIF 
Lla GYQVPPAELE SVLLQHPNIF 
Lmi GYQVPPAELE SVLLQHPNIF 
Pmi GYQVPPAELE SILLQHPFIF 
Ppy GYQVAPAELE SILLQHPNIF 
Lno GYQVPPAELE SILLQHPFIF 
Ppel GYQVPPAELE ALLLQHPFIE 
Phg AYQVAPAELE ALLLQHPYIA 
GR GSQVAPAELE EILLKNPCIR 
YG GSQVAPAELE EILLKNPCIR 
Ppe2 GYQVAPAEIE GILLQHPYIV 

49-7c6 

78-OblO 

90-lb5 



500 

DAGVAGVPDP VAGELPGAW VLESGKNMTE 
DAGVAGVPDP lAGELPGAW VLEKGKSMTE 
DAGVAGVPDP DAGELPGAW VMEKGKTMTE 
DAGVAGIPDP DAGELPAAW VLEEGKMMTE 
DAGVAGLPOD DAGELPAAW VLEHGKTMTE 
DAGVAGIPDP DAGELPAAW VLEEGKTMTE 
DAGVAGVPDE VAGDLPGAW VLKEGKSITE 
DAGVTGIPDE EAGELPAACV VLEPGKTMTE 
DVAWGIPDL EAGELPSAFV VIQPGKEITA 
DVAWGIPDL EAGELPSAFV VKQPGKEITA 
DAGVTGIPDE AAGELPAAGV WQTGKYLNE 



Cons — QV-PAE-E --LL~P-I- D— V-G-PD- -AG-LP-A-V V GK 



Lcr KEVMDYVASQ VSNAKRLRGG VRFVDEVPKG LTGKIDGRA. IREILKKPV. 
Lla KEVMDYVASQ VSNAKRLRGG VRFVDEVPKG LTGKIDGKA. IREILKKPV. 
Lmi KEIVDYVNSQ WNHKRLRGG VRFVDEVPKG LTGKIDAKV. IREILKKPQ. 
Pmi QEVMDYVAGQ VTASKRLRGG VKFVDEVPKG LTGKIDSRK. IREILTMGQK 
Ppy KEIVDYVASQ VTTAKKLRGG WFVDEVPKG LTGKLDARK. IREILIKAKK 
Lno QEVMDYVAGQ VTASKRLRGG VKFVDEVPKG LTGKIDGRK. IREILMMGKK 
Ppel KEIQDYVAGQ VTSSKKLRGG VEFVKEVPKG FTGKIDTRK. IKEILIKAQK 
Phg KEVMDYIAER VTPTKRLRGG VLFVNNIPKG ATGKLVRTE. LRRLLTQRA. 
GR KEVYDYLAER VSHTKYLRGG VRFVDSIPRN VTGKITRKEL LKQLLEKS., 
YG KEVYDYLAER VSHTKYLRGG VRFVDSIPRN VTGKITRKEL LKQLLEKS.. 
Ppe2 QIVQNFVSSQ VSTAKWLRGG VKFLDEIPKG STGKIDRKV. LRQMFEKH . . 
49-7c6 

78-OBlO D 
90-lb5 DY A 

Cons V K-LRGG V-F P TGK 
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, .AKM 
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, . SKL 


Ppy G. 


. .GKSKL 
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Ppel GKSKSKAKL 
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GR 


GKL 




. SKL 


Ppe2 . . 


K5K' 


49-7c6 


TNG- 


70-nblO 


TMC * 
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Key: 

Lcr: Luciola cruciata 

Lla: Luciola lateralis 

Lmi: Luciola mingrelica 

Pmi: Pyrocoelia miyako 

Ppy : Photinus pyralis 

Lno : Lampyris noctiluca 

Ppe-1: Photuris pennsylvanica (1) 

Phg : Phengodes sp . 

Gr : Pyrophorus plagiophthalamus (green) 

YG: Pyrophorus plagiophthalamus (yellow green) 

Ppe-2: Photuris pennsylvanica (2) 
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FIGURE 20 
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FIGURE 22 



GGATCCAATGGAAGATAAAAATATTTTATATGGACCTGAACCATTTTATCCCTTGGCTGA 
TGGGACGGCTGGAGAACAGATGTTTTACGCATTATCTCGTTATGCAGATATTTCAGGATG 
CATAGCATTGACAAATGCTCATACAAAAGAAAATGTTTTATATGAAGAGTTTTTAAAATT 
GTCGTGTCGTTTAGCGGAAAGTTTTAAAAAGTATGGATTAAAACAAAACGACACAATAGC 
GGTGTGTAGCGAAAATGGTTTGCAATTTTTCCTTCCTATAATTGCATCATTGTATCTTGG 
AATAATTGCAGCACCTGTTAGTGATAAATACATTGAACGTGAATTAATACACAGTCTTGG 
TATTGTAAAACCACGCATAATTTTTTGCTCCAAGAATACTTTTCAAAAAGTACTGAATGT 
AAAATCTAAATTAAAATATGTAGAAACTATTATTATATTAGACTTAAATGAAGACTTAGG 
AGGTTATCAATGCCTCAACAACTTTATTTCTCAAAATTCCGATATTAATCTTGACGTAAA 
AAAATTTAAACCATATTCTTTTAATCGAGACGATCAGGTTGCGTTGGTAATGTTTTCTTC 
TGGTACAACTGGTGTTTCGAAGGGAGTCATGCTAACTCACAAGAATATTGTTGCACGATT 
TTCTCTTGCAAAAGATCCTACTrrTGGTAACGCAATTAATCCAACGACAGCAATTTTAA^ 
GGTAATACCTTTCXACCATGGTTTTGGTATGATGACCACATTAGGATACTTTACTTGTGG 
ATTCCGAGTTGTTCTAATGCACACGTTTGAAGAAAAACTATTTCTACAATCATTACAAGA 
TTATAAAGTGGAAAGTACTTTACTTGTACCAACATTAATGGCATTTCTTGCAAAAAGTGC 
ATTAGTTGAAAAGTACGATTTATCGCACTTAAAAGAAATTGCATCTGGTGGCGCACCTTT 
ATCAAAAGAAATTGGGGAGATGGTGAAAAAACGGTTTAAATTAJVACTTTGTCAGGCAAGG 
GTATGGATTAACAGAAACCACTTCGGCTGTTTTAATTACACCGAACAATGACGTCAGACC 
GGGATCAACTGGTAAAATAGTACCATTTCACGCTGTTAAAGTTGTCGATCCTACAACAGG 
AAAAATTTTGGGGCCAAATGAAACTGGAGAATTGTATTTTAAAGGCGACATGATAATGAA 
AGGTTATTATAATAATGAAGAAGCTACTAAAGCAATTATTAACAAAGACGGATGGTTGCG 
CTCTGGTGATATTGCTTATTATGACAATGATGGCCATTTTTATATTGTGGACAGGCTGAA 
GTCATTAATTAAATATAAAGGTTATCAGGTTGCACCTGCTGAAATTGAGGGAATACTCTT 
ACAACATCCGTATATTGTTGATGCCGGCGTTACTGGTATACCGGATGAAGCCGCGGGCGA 
GCTTCCAGCTGCAGGTGTTGTAGTACAGACTGGAAAATATCTAAACGAACAAATCGTACA 
AAATTTTGTTTCCAGTCAAGTTTCAACAGCCAAATGGCTACGTGGTGGGGTGAAATTTTT 
GGATGAAATTCCCAAAGGATCAACTGGAAAAATTGACAGAAAAGTGTTAAGACAAATGTT 
TG AAAAACACACCAATGGG * 
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FIGURE 23 



GGATCCAATGGAAGATAAAAATATTTTATATGGACCTGAACCATTTTATCCCTTGGCTGA 
TGGGACGGCTGGAGAACAGATGTTTTACGCATTATCTCGTTATGCAGATATTTCAGGATG 
CATAGCATTGACAAATGCTCATACAAAAGAAAATGTTTTATATGAAGAGTTGTTAAAATT 
GTCGTGTCGTTTAGCGGAAAGTTTTAAAAAGTATGGATTAAAACAAAACGACACAATAGC 
GGTGTGTAGCGAAAATGGTTTGCAATTTTTCCTTCCTATAATTGCATCATTGTATCTTGG 
AATAATTGCAGCACCTGTTAGTGATAAATACATTGAACGTGAATTAATACACAGTCTTGG 
TATTGTAAAACCACGCATAATTTTTTGCTCCAAGAATACTTTTCAAAAAGTACTGAATGT 
AAAATCTAAATTAAAATATGTAGAAACTATTATTATATTAGACTTAAATGAAGACTTAGG 
AGGTTATCAATGCCTCAACAACTTTATTTCTCAAAATTCCGATATTAATCTGGACGTAAA 
AAAATTTAAACCATATTCTTTTAATCGAGACGATCAGGTTGCGTTGGTAATGTTTTCTTC 
TGGTACAACTGGTGTTTCGAAGGGAGTCATGCTAACTCACAAGAATATTGTTGCACGATT 
TTCTCATGaUAAGATCCTACTTTTGGTAACGCAATTAATCCAACGACAGCAATTTTAAC 
GGTAATACCTTTCCACCATGGTTTTGGTATGATGACCACATTAGGATACTTTACTTGTGG 
ATTCCGAGTTGTTCTAATGCACACGTTTG7VAGAAAAACTATTTCTACAATCATTACAAGA 
TTATAAAGTGGAAAGTACTTTACTTGTACCAACATTAATGGCATTTTTTGCAAAAAGTGC 
ATTAGTTGAAAAGTACGATTTATCGCACTTAAAAGAAATTGCATCTGGTGGCGCACCTTT 
ATCAAAAGAAATTGGGGAGATGGTGAAAAAACGGTTTAAATTAAACTTTGTCAGGCAAGG 
GTATGGATTAACAGAAACCACTTCGGCTGTTTTAATTACACCGAACAATGACGTCAGACC 
GGGATCAACTGGTAAAATAGTACCATTTCACGCTGTTAAAGTTGTCGATCCTACAACAGG 
AAAAATTTTGGGGCCAAATGAAACTGGAGAATTGTATTTTAAAGGCGACATGATAATGAA 
AGGTTATTATAATAATGAAGAAGCTACTAAAGCAATTATTAACAAAGACGGATGGTTGCG 
CTCTGGTGATATTGCTTATTATGACAATGATGGCCATTTTTATATTGTGGACAGGCTGAA 
GTCATTAATTAAATATAAAGGTTATCAGGTTGCACCTGCTGAAATTGAGGGAATACTCTT 
ACAACATCCGTATATTGTTGATGCCGGCGTTACTGGTATACCGGATGAAGCCGCGGGCGA 
GCTTCCAGCTGCAGGTGTTGTAGTACAGACTGGAAAATATCTAAACGAACAAATCGTACA 
AAATTTTGTTTCCAGTCAAGTTTCAACAGCCAAATGGCTACGTGGTGGGGTGAAATTTTT 
GGATGAAATTCCCAAAGGATCAACTGGAAAAATTGACAGAAAAGTGTTAAGACAAATGTT 
TG AAAAAC ACACCAATGGG * 
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FIGURE 24 



GGATCCAATGGAAGATAAAAATATTTTATATGGACCTGAACCATTTTATCCCTTGGCTGA 

TGGGACGGCTGGAGAACAGATGTTTTACGCATTATCTCGTTATGCAGATATTTCAGGATG 

CATAGCATTGACAAATGCTCATACAAAAGAAAATGTTTTATATGAAGAGTTTTTAAAATT 

GTCGTGTCGTTTAGCGGAAAGTTTTAAAAAGTATGGATTAAAACAAAACGACACAATAGC 

GGTGTGTAGCGAAAATGGTTTGCAATTTTTCCTTCCTATAATTGCATCATTGTATCTTGG 

AATAATTGCAGCACCTGTTAGTGATAAATACATTGAACGTGAATTAATACACAGTCTTGG 

TATTGTAAAACCACGCATAATTTTTTGCTCCAAGAATACTTTTCAAAAAGTACTGAATGT 

AAAATCTAAATTAAAATATGTAGAAACTATTATTATATTAGACTTAAATGAAGACTTAGG 

AGGTTATCAATGCCTCAACAACTTTATTTCrCAAAATTCCGATATTAATOT 

AAAATTTAAACCATATTCTTTTAATCGAGACGATCAGGTTGCGTTGGTAATGTTTTCTTC 

TGGTACAACTGGTGTTTCGAAGGGAGTCATGCTAACTCACAAGAATATTGTTG TACG ATT 

TTCTCTTGCAAAAGATCCTACTTTTGGTAACGCAATTAATCCAACGACAGCAATTTTAAC 

GGTAATACCTTTCCACCATGGTTTTGGTATGATGACCACATTAGGATACTTTACTTGTGG 

ATTCCGAGTTGTTCTAATGCACACGTTTGAAGAAAAACTATTTCTACAATCATTACAAGA 

TTATAAAGTGGAAAGTACTTTACTTGTACCAACATTAATGGCATTTCTTGCAAA^ 

ATTAGTTGAAAAGTACGATTTATCGCACTTAAAAGAAATTGCATCTGGTGGCGCACCTTT 

ATCAAAAGAAATTGGGGAGATGGTGAAAAAACGGTTTAAATTAAACTTTGTCAGGCAAGG 

GTATGGATTAACAGAAACCACTTCGGCTGTTTTAATTACACCGAACAATGACGTCAGACC 

GGGATCAACTGGTAAAATAGTACCATTTCACGCTGTTAAAGTTGTCGATCCTACAACAGG 

AAAAATTTTGGGGCCAAATGAAACTGGAGAATTGTATTTTAAAGGCGACATGATAATGAA 

AGGTTATTATAATAATGAAGAAGCTACTAAAGCAATTATTACCAAAGACGGATGGTTGCG 

CTCTGGTGATATTGCTTATTATGACAATGATGGCCATTTTTATATTGTGGACAGGCTGAA 

GTCATTAATTAAATATAAAGGTTATCAGGTTGCACCTGCTGAAATTGAGGGAATACTCTT 

ACAACATCCGTATATTGTTGATGCCGGCGTTACTGGTATACCGGATGAAGCCGCGGGCGA 

GCTTCCAGCTGCAGGTGTTGTAGTACAGACTGGAAAATATCTAAACGAACAAATCGTACA 

AAATTTTGTTTCCAGTCAAGTTTCAACAGCCAAATGGCTACGTGGTGGGGTGAAATTTTT 

GGATGAAATTCCCAAAGGATCAACTGGAAAAATTGACAGAAAAGTGTTAAGACAAATGTT 

TGAAAAACACACCAATGGG * 
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FIGURE 25 



GGATCCAATGGAAGATAAAAATATTTTATATGGACCTGAACCATTTTATCCCTTGGCTGA 
TGGGACGGCTGGAGAACAGATGTTTTACGCATTATCTCGTTATGCAGATATTTCAGGATG 
CATAGCATTGACAAATGCTCATACAAAAGAAAATGTTTTATATGAAGAGTTTTTAAAATT 
GTCGTGTCGTTTAGCGGAAAGTTTTAAAAAGTATGGATTAAAACAAAACGACACAATAGC 
GGTGTGTAGCGAAAATGGTTTGCAATTTTTCCTTCCTATAATTGCATCATTGTATCTTGG 
AATAATTGCAGCACCTGTTAGTGATAAATACATTGAACGTGAATTAATACACAGTCTTGG 
TATTGTAAAACCACGCATAATTTTTTGCTCCAAGAATACTTTTCAAAAAGTACTGAATGT 
AAAATCTAAATTAAAATATGTAGAAACTATTATTATATTAGACTTAAATGAAGACTTAGG 
AGGTTATCAATGCCTCAACAACTTTATTTCTCAAAATTCCGATATTAATCT TGACG TAAA 
AAAATTTAAACCATATTCTTTTAATCGAGACGATCAGGTTGCGTTGGTAATGTTTTCTTC 
TGGTACAACTGGTGTTTCGAAGGGAGTCATGCTAACTCACAAGAATATTGTTGCACGATT 
TTCTATTGCAAAAGATCCTACTTTTGGTAACGCAATTAATCCAACGACAGCAATTTTAAC 
GGTAATACCTTTCCACCATGGTTTTGGTATGATGACCACATTAGGATACTTTACTTGTGG 
ATTCCGAGTTGTTCTAATGCACACGTTTGAAGAAAAACTATTTCTACAATCATTACAAGA 
TTATAAAGTGGAAAGTACTTTACTTGTACCAACATTAATGGCATTTCTTGCAAAAAGTGC 
ATTAGTTGAAAAGTACGATTTATCGCACTTAAAAGAAATTGCATCTGGTGGCGCACCTTT 
ATCAAAAGAAATTGGGGAGATGGTGAAAAAACGGTTTAAATTAAACTTTGTCAGGCAAGG 
GTATGGATTAACAGAAACCACTTCGGCTGTTTTAATTACACCGAACAATGACGTCAGACC 
GGGATCAACTGGTAAAATAGTACCATTTCACGCTGTTAAAGTTGTCGATCCTACAACAGG 
AAAAATTTTGGGGCCAAATGAAACTGGAGAATTGTATTTTAAAGGCGACATGATAATGAA 
AGGTTATTATAATAATGAAGAAGCTACTAAAGCAATTATTAACAAAGACGGATGGTTGCG 
CTCTGGTGATATTGCTTATTATGACAATGATGGCCATTTTTATATTGTGGACAGGCTGAA 
GTCATTAATTAAATATAAAGGTTATCAGGTTGCACCTGCTGAAATTGAGGGAATACTCTT 
ACAACATCCGTATATTGTTGATGCCGGCGTTACTGGTATACCGGATGAAGCCGCGGGCGA 
GCTTCCAGCTGCAGGTGTTGTAGTACAGACTGGAAAATATCTAAACGAACAAATCGTACA 
AAATTTTGTTTCCAGTCAAGTTTCAACAGCCAAATGGCTACGTGGTGGGGTGAAATTTTT 
GGATGAAATTCCCAAAGGATCAACTGGAAAAATTGACAGAAAAGTGTTAAGACAAATGTT 
TG AAAAACACACCAATGGG * 
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FIGURE 26 



GGATCCAATGGAAGATAAAAATATTTTATATGGACCTGAACCATTTTATCCCTTGGCTGA 
TGGGACGGCTGGAGAACAGATGTTTGACGCATTATCTCGTTATGCAGATATTTCAGGATG 
CATAGCATTGACAAATGCTCATACAAAAGAAAATGTTTTATATGAAGAGTTTTTAAAATT 
GTCGTGTCGTTTAGCGGAAAGTTTTAAAAAGTATGGATTAAAACAAAACGACACAATAGC 
GGTGTGTAGCGAAAATGGTTTGCAATTTTTCCTTCCTATAATTGCATCATTGTATCTTGG 
AATAATTGCAGCACCTGTTAGTGATAAATACATTGAACGTGAATTAATACACAGTCTTGG 
TATTGTAAAACCACGCATAATTTTTTGCTCCAAGAATACTTTTCAAAAAGTACTGAATGT 
AAAATCTAAATTAAAATATGTAGAAACTATTATTATATTAGACTTAAATGAAGACTTAGG 
AGGTTATCAATGCCTCAACAACTTTATTTCTCAAAATTCCGATATTAATCTTGACGTAAA 
AAAATTTAAACCATATTCTTTTAATCGAGACGATCAGGTTGCGTTGGTAATGTTTTCTTC 
TGGTACAACTGGTGTTTCGAAGGGAGTCATGCTAACTCACAAGAATATTGTTGCACGATT 
TTCTCATGCAAAAGATCCTACTTTTGGTAACGCAATTAATCCAACGACAGCAATTTTAAC 
GGTAATACCTTTCCACCATGGTTTTGGTATGAtGACCACATTAGGATACTTTACTTGTGG 
ATTCCGAGTTGTTCTAATGCACACGTTTGAAGAAAAACTATTTCTACAATCATTACAAGA 
TTATAAAGTGGAAAGTACTTTACTTGTACCAACATTAATGGCATTTTTTGCAAAAAGTGC 
ATTAGTTGAAAAGTACGATTTATCGCACTTAAAAGAAATTGCATCTGGTGGCGCACCTTT 
ATCAAAAGAAATTGGGGAGATGGTGAAAAAACGGTTTAAATTAAACTTTGTCAGGCAAGG 
GTATGGATTAACAGAAACCACTTCGGCTGTTTTAATTACACCGAACAATGACGTCAGACC 
GGGATCAACTGGTAAAATAGTACCATTTCACGCTGTTAAAGTTGTCGATCCTACAACAGG 
AAAAATTTTGGGGCCAAATGAAACTGGAGAATTGTATTTTAAAGGCGACATGATAATGAA 
AGGTTATTATAATAATGAAGAAGCTACTAAAGCAATTATTAACAAAGACGGATGGTTGCG 
CTCTGGTGATATTGCTTATTATGACAATGATGGCCATTTTTATATTGTGGACAGGCTGAA 
GTCATTAATTAAATATAAAGGTTATCAGGTTGCACCTGCTGAAATTGAGGGAATACTCTT 
ACAACATCCGTATATTGTTGATGCCGGCGTTACTGGTATACCGGATGAAGCCGCGGGCGA 
GCTTCCAGCTGCAGGTGTTGTAGTACAGACTGGAAAATATCTAAACGAACAAATCGTACA 
AAATTTTGTTTCCAGTCAAGTTTCAACAGCCAAATGGCTACGTGGTGGGGTGAAATTTTT 
GGATGAAATTCCCAAAGGATCAACTGGAAAAATTGACAGAAAAGTGTTAAGACAAATGTT 
TG AAAAAC ACACCAATGGG * 
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FIGURE 27 



DPMEDKNILYGPEPFYPLADGTAGEQMFYALSRYADISGCIALTNAHTKENVLYEEFLKL 
SCRLAESFKKYGLKQNDTIAVCSENGLQFFLPIIASLYLGIIAAPVSDKYIERELIHSLG 
I VKPRI I FCSKNTFQKVLNVKSKLKYVETII ILDLNEDLGGYOCLNNFISQNSDINLDVK 
KFKPYSFNRDDQVALVMFSSGTTGVSKGVMLTHKNIVARFSLAKDPTFGNAINPTTAILT 
VIPFHHGFG^l^f^TLGYFTCGFRVVLMHTFEEKLFLQSU3DYKVESTLLVPTLMAFLAKSA 
LVEKYDLSHLKEIASGGAPLSKEIGEMVKKRFKLNFVRQGYGLTETTSAVLITPNNDVRP 
GSTGKIVPFHAVKWDPTTGKILGPNETGELYFKGDMIMKGYYNNEEATKAIINKDGWLR 
SGDIAYYDNDGHFYIVDRLKSLIKYKGYQVAPAEIEGILLQHPYIVDAGVTGIPDEAAGE 
LPAAGWVQTGKYLNEQI VQNFVSSQVSTAKWLRGGVKFLDEI PKGSTGKI DRKVLRQMF 
EKHTNG 
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FIGURE 28 



DPMEDKNILYGPEPFYPLADGTAGEQMFYALSRYADISGCIALTNAHTKENVLYEELLKL 
SCRLAESFKKYGLKQNDTIAVCSENGLQFFLPIIASLYLGIIAAPVSDKYIERELIHSLG 
IVKPRI I FCSKNTFQKVLNVKSKLKYVETI 1 1 LDLNEDLGGYQCLNNFISQNSDINLDVK 
KFKPYSFNRDDQVALVMFSSGTTGVSKGVMLTHKNIVARFSHAKDPTFGNAINPTTAILT 
VIPFHHGFGMMTTLGYFTCGFRWLMHTFEEKLFLQSLQDYKVESTLLVPTLMAFFAKSA 
LVEKYDLSHLKEIASGGAPLSKEIGEMVKKRFKLNFVRQGYGLTETTSAVLITPKNDVRP 
GSTGKIVPFHAVKWDPTTGKILGPNETGELYFKGDMIMKGYYNNEEATKAIINKDGWLR 
SGDIAYYDNDGHFYIVDRLKSLIKYKGYOVAPAEIEGILLQHPYIVDAGVTGIPDEAAGE 
LPAAGVWQTGKYLNEQIVQNFVSSQVSTAKWLRGGVKFLDEIPKGSTGKIDRKVLRQMF 

EKHTNG 
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FIGURE 29 



DPMEDKNILYGPEPFYPLADGTAGEQMFYALSRYADISGCIALTNAHTKENVLYEEFLKL 
SCRLAESFKKYGLKQNDTIAVCSENGLQFFLPIIASLYLGIIAAPVSDKYIERELIHSLG 
IVKPRIIFCSKNTFQKVLNVKSKLKYVETIIILDLNEDLGGYQCLNNFISQNSDINLDVK 
KFKPTSFNRDDQVALVMFSSGTTGVSKGVMLTHKNIWRFSLAKDPTFGNAINPTTAILT 
VIPFHHGFGMMTTLGYFTCGFRWLMHTFEEKLFLQSLQDYKVESTLLVPTLMAFFAKSA 
LVEKYDLSHLKEIASGGAPLSKEIGEMVKKRFKLNFVRQGYGLTETTSAVLITPNNDVRP 
GSTGKIVPFHAVKWDPTTGKILGPNETGELYFKGDMIMKGYYNNEEATKAIITKDGWLR 
SGDIAYYDNDGHFYIVDRLKSLIKYKGYQVAPAEIEGILLQHPYIVDAGVTGIPDEAAGE 
LPAAGVWQTGKYLNEQI VQNFVSSQVSTAKWLRGGVKFLDEI PKGSTGKI DRKVLRQMF 

EKHTNG 
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FIGURE 30 



DPMEDKNILYGPEPFYPLADGTAGEQMFYALSRYADISGCIALTNAHTKENVLYEEFLKL 
SCRLAESFKKYGLKQNDTIAVCSENGLQFFLPIIASLYLGIIAAPVSOKYIERELIHSLG 
IVKPRIIFCSKNTFQKVLNVKSKLKYVETIIILDLNEDLGGYQCLNNFISQNSDINLDVK 
KFKPTS FTIRDDQVALVMFSSGTTGVSKGVMLTHKNI VARFS lAKDPTFGNAINPTTAI LT 
VIPFHHGFGMM^LGYFTCGFRVVLMHTFEEKLFLQSLQDYKVESTLLVPTLMAFLAKSA 
LVEKYDLSHLKEIASGGAPLSKEIGEMVKKRFKLNFVRQGYGLTETTSAVLITPNNDVRP 
GSTGKIVPFHAVKWDPTTGKILGPNETGELYFKGDMIMKGYYNNEEATKAIINKDGWLR 
SGDIAYYDNDGHFYIVORLKSLIKYKGYQVAPAEIEGILLQHPYIVDAGVTGIPDEAAGE 
LPAAGWVQTGKYLNEQI VQNFVSSQVSTAKWLRGGVKFLDEI PKGSTGKI DRKVLRQMF 

EKHTKG 
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FIGURE 31 



DPMEDKNILYGPEPFYPLADGTAGEQMFDALSRYADISGCIALTNAHTKENVLYEEFLKL 
SCRLAESFKKYGLKQNDTIAVCSENGLQFFLPIIASLYLGIIAAPVSDKYIERELIHSLG 
IVKPRIIFCSKNTFQKVLNVKSKLKYVETIIILDLNEDLGGYQCLNNFISQNSDINLDVK 
KFKPYSFNRDDQVALVMFSSGTTGVSKGVMLTHKNIVARFSHAKDPTFGNAINPTTAILT 
VI PFHHGFGMMTTLGY FTCGFRWLMHTFEEKLFLQSLQDYKVESTLLVPTLMAFFAKSA 
LVEKYDLSHLKEIASGGAPLSKEIGEMVKKRFKLNFVRQGYGLTETTSAVLITPNNDVRP 
GSTGKIVPFHAVKWDPTTGKILGPNETGELYFKGDMIMKGYYNNEEATKAIINKDGWLR 
SGDIAYYDNDGHFYIVDRLKSLIKYKGYQVAPAEIEGILLQHPYIVDA6VTGIPDEAAGE 
LPAAGVWQTGKYLNEQIVQNFVSSQVSTAKWLRGGVKFLDEIPKGSTGKIDRKVLRQMF 
EKHTNG 
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FIGURE 32 



GGATCCAATGGCAGATAAAAATATTTTATATGGGCCCGAACCATTTTATCCCTTGGCTGA 
TGGGACGGCTGGAGAACAGATGTTTGACGCATTATCTCGTTATGCAGATATTTCAGGATG 
CATAGCATTGACAAATGCTCATACAAAAGAAAATGTTTTATATGAAGAGTTTTTAAAATT 
GTCGTGTCGTTTAGCGGAAAGTTTTAAAAAGTATGGATTAAAACAAAACGACACAATAGC 
GGTGTGTAGCGAAAATGGTTTGCAATTTTTCCTTCCGTAATTGCATCATTGTATCTTGGA 
ATAATTGCAGCACCTGTTAGTGATAAATACATTGAACGTGAATTAATACACAGTCTTGGT 
ATTGTAAAACCACGCATAATTTTTTGCTCCAAGAATACITTTCAAAAAGTACTGAATGTA 
AAATCTAAATTAAAATCTGTAGAAACTATTATTATATTAGACTTAAATGAAGACTTAGGA 
GGTTATCAATGCCTCAACAACTTTATTTCTCAAAATTCCGATATTAATCTTGACGTAAAA 
AAATTTAAACCATATTCTTTTAATCGAGACGATCAGGTTGCGTTGGTAATGTTTTCTTCT 
GGTACAACTGGTGTTTCGAAGGGAGTCATGCTAACTCACAAGAATATTGTTGCACGATTT 
TCTCTTGCAAAAGATCCTACTTTTGGTAACGCAATTAATCCCACGACAGCAATTTTAACG 
GTAATACCTTTCCACCATGGTTTTGGTATGAtgACCACATTAGGATACTTTACTTGTGGA 
TTCCGAGTTGTTCTAATGCACACGTTTGAAGAAAAACTATTTCTACAATCATTACAAGAT 
TATAAAGTGGAAAGTACTTTACTTGTACCAACATTAATGGCATTTCTTGCAAAAAGTGCA 
TTAGTTGAAAAGTACGATTTATCGCACTTAAAAGAAATTGCATCTGGTGGCGCACCTTTA 
TCAAAAGAAATTGGGGAGATGGTGAAAAAACGGTTTAAATTAAACTTTGTCAGGCAAGGG 
TATGGATTAACAGAAACaVCTTCGGCTGTTTTAATTACACCGAAAxxxxxxGCCAGACCG 
GGATCAACTGGTA/VAATAGTACCATTTCACGCTGTTAAAGTTGTCGATCCTACAACAGGA 
AAAATTTTGGGGCCAAATGAACCTGGAGAATTGTATTTTAAAGGCGCCATGATAATGAAG 
GGTTATTATAATAATGAAGAAGCTACTAAAGCAATTATTGATAATGACGGATGGTTGCGC 
TCTGGTGATATTGCTTATTATGACAATGATGGCCATTTTTATATTGTGGACAGGCTGAAG 
TCATTAATTAAATATAAAGGTTATCAGGTTGCACCTGCTGAAATTGAGGGAATACTCTTA 
CAACATCCGTATATTGTTGATGCCGGCGTTACTGGTATTCCGGATGAAGCCGCGGGCGAG 
CTTCCAGCTGCAGGTGTTGTAGTACAGACTGGAAAATATCTAAACGAACAAATCGTACAA 
GATTTTGTTTCCAGTCAAGTTTCAACAGCCAAATGGCTACGTGGTGGGGTGAAATTTTTG 
GATGAAATTCCCAAAGGATCAACTGGAAAAATTGACAGAAAAGTGTTAAGACAAATGTTT 

G AAAAAC ACACCAATGGG * 



wo 99/14336 PCTrtJS98/19494 



/ 



39 / 59 



FIGURE 33 



GGATCCAATGGCAGATAAAAATATTTTATATGGGCCCGAACCATTTTATCCCTTGGCTGA 
TGGGACGGCTGGAGAACAGATGTTTTACGCATTATCTCGTTATGCAGATATTTCAGGATG 
CATAGCATTGACAAATGCTCATACAAAAGAAAATGTTTTATATGAAGAGTTTTTAAAATT 
GTCGTGTCGTTTAGCGGAAAGTTTTAAAAAGTATGGATTAAAACAAAACGACACAATAGC 
GGTGTGTAGCGAAAATGGTTTGCAATTTTTCCTTCCTGTAATTGCATCATTGTATCTTGG 
AATAATTGCAGCACCTGTTAGTGATAAATACATTGAACGTGAATTAATACACAGTCTTGG 
TATTGTAAAACCACGCATAATTTTTTGCTCCAAGAATACTTTTCAAAAAGTACTGAATGT 
AAAATCTAAATTAAAATATGTAGAAACTATTATTATATTAGACTTAAATGAAGACTTAGG 
AGGTTATCAATGCCTCAACAACTTTATTTCTCAAAATTCCGATATTAATCTTGACGTAAA 
AAAATTTAAACCATATTCTTTTAATCGAGACGATCAGGTTGCGTTGGTAATGTTTTCTTC 
TGGTACAACTGGTGTTCCGAAGGGAGTCATGCTAACTCACAAGAATATTGTTGCACGATT 
TTCTCTTGCAAAAGATCCTACTTTTGGTAACGCAATTAATCCAACGACAGCAATTTTAAC 
GGTAATACCTTTCCACCATGGTTTTGGTATGATGACCACATTAGGATACTTTACTTGTGG 
ATTCCGAGTTGTTCTAATGCACACGTTTGAAGAAAAACTATTTCTACAATCATTACAAGA 
TTATAAAGTGGAAAGTACTTTACTTGTACCAACATTAATGGCATTTCTTGCAAAAAGTGC 
ATTAGTTGAAAAGTACGATTTATCGCACTTAAAAGAAATTGCATCTGGTGGCGCACCTTT 
ATCAAAAGAAATTGGGGAGATGGTGAAAAAACGGTTTAAATTAAACTTTGTCAGGCAAGG 
GTATGGATTAACAGAAACCACTTCGGCTGTTTTAATTACACCGAAAxxxxxxGTCAGACC 
GGGATCAACTGGTAAAATAGTACCATTTCACGCTGTTAAAGTTGTCGATCCTACAACAGG 
AAAAATTTTGGGGCCAAATGAACCTGGAGAATTGTATTTTAAAGGCGACATGATAATGAA 
AGGTTATTATAATAATGAAGAAGCTACTAAAGCAATTATTGATAAAGACGGATGGTTGCG 
CTCTGGTGATATTGCTTATTATGACAATGATGGCCATTTTTATATTGTGGACAGGCTGAA 
GTCATTAATTAAATATAAAGGTTATCAGGTTGCACCTGCTGAAATTGAGGGAATACTCTT 
ACAACATCCGTATATTGTTGATGCCGGCGTTACTGGTATACCGGATGAAGCCGCGGGCGA 
GCTTCCAGCTGCAGGTGTTGTAGTACAGACTGGAAAATATCTAAACGAACAAATCGTACA 
AAATTTTGTTTCCAGTCAAGTTTCAACAGCCAAATGGCTACGGGGTGGGGTGAAATTTTT 
GGATGAAATTCCCAAAGGATCAACTGGAAAAATTGACAGAAAAGTGTTAAGACAAATGTT 
TGAAAAACACACCAATGGG * 
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FIGURE 34 



GGATCCAATGGCAGATAAAAATATTTTATATGGGCCCGAACCATTTTATCCCTTGGCTGA 
TGGGACGGCTGGAGAACAGATGTTTGACGCATTATCTCGTTATGCAGATATTCCCGGATG 
CATAGCATTGACAAATGCTCyVTACAAAAGAAAATGTTTTATATGAAGAGTTTTTAAAATT 
GTCGTGTCGTTTAGCGGAAAGTTTTAAAAAGTATGGATTAAAACAAAACGACACAATAGC 
GGTGTGTAGCGAAAATGGTTTGCAATATTTCCTTCCTGTAATTGCATCATTGTATCTTGG 
AATAATTGCAGCACCTGTTAGTGATAAATACATTGAACGTGAATTAATACACAGTCTTGG 
TATTGTAAAACCACGCATAATTTTTTGCTCCAAGAATACTTTTCAAAAAGTACTGAATGT 
AAAATCTAAATTAAAATATGTAGAAACTATTATTATATTAGACTTAAATGAAGACTTAGG 
AGGTTATCAATGCCTCAACAACTTTATTTCTCAAAATTCCGATATTAATCTTGACGTAAA 
AAAATTTAAACCAAATTCTTTTAATCGAGACGATCAGGTTGCGTTGGTAATGTTTiCTTC 
TGGTACAACTGGTGTTCCGAAGGGAGTCATGCTAACTCACAAGAATATTGTTGCACGATT 
TTCTATTGCAAAAGATCCTACTTTTGGTAACGCAATTAATCCAACGACAGCAATTTTAAC 
GGTAATACCTTTCCACCATGGTTTTGGTATGATGACCACATTAGGATACTTTACTTGTGG 
ATTCCGAGTTGTTCTAATGCACACGTTTGAAGAAAAACTATTTCTACAATCATTACAAGA 
TTATAAAGTGGAAAGTACTTTACTTGTACCAACyVTTAATGGCATTTCTTGCAAAAAGTGC 
ATTAGTTGAAAAGTACGATTTATCGCACTTAAAAGAAATTGCATCTGGTGGCGCACCTTT 
ATCAAAAGAAATTGGGGAGATGGTGAAAAAACGGTTTAAATTAAACTTTGTCAGGCAAGG 
GTATGGATTAACAGAAACCACTTCGGCTGTTTTAATTACACCGAAAxxxxxxGCCAGACC 
GGGATCAACTGGTAAAATAGTACCATTTCACGCTGTTAAAGTTGTCGATCCTACAACAGG 
AAAAATTTTGGGGCCAAATGAACCTGGAGAATTGTATTTTAAAGGCGCCATGATAATGAA 
GGGTTATTATAATAATGAAGAAGCTACTAAAGCAATTATTGATAAAGACGGATGGTTGCG 
CTCTGGTGATATTGCTTATTATGACAATGATGGCCATTTTTATATTGTGGACAGGCTGAA 
GTCATTAATTAAATATAAAGGTTATCAGGTTGCACCTGCTGAAATTGAGGGAATACTCTT 
ACAACATCCGTATATTGTTGATGCCGGCGTTACTGGTATACCGGATGAAGCCGCGGGCGA 
GCTTCCAGCTGCAGGTGTTGTAGTACAGACTGGAAAATATCTAAACGAACAAATCGTACA 
AAATTTTGTTTCCAGTCAAGTTTCAACAGCCAAATGGCTACGTGGTGGGGTGAAATTTTT 
GGATGAAATTCCCAAAGGATCAACTGGAAAAATTGACAGAAAAGTGTTAAGACAAATGTT 
TGAAAAACACACCAATGGG* 



wo 99/14336 



41 I 59 



FIGURE 35 



GGATCCAATGGCAGATAAAAATATTTTATATGGGCCCGAACCATTTTATCCCTTGGCTGA 
TGGGACGGCTGGAGAACAGATGTTTGACGCATTATCTCGTTATGCAGATATTCCCGGATG 
CATAGCATTGACAAATGCTCATACAAAAGAAAATGTTTTATATGAAGAGTTTTTAAAATT 
GTCGTGTCGTTTAGCGGAAAGTTTTAAAAAGTATGGATTAAAACAAAACGACACAATAGC 
GGTGTGTAGCGAAAATGGTTTGCAATTTTTCCTTCCTGTAATTGCATCATTGTATCTTGG 
AATAATTGCAGCACCTGTTAGTGATAAATACGTTGAACGTGAATTAATACACAGTCTTGG 
TATTGTAAAACCACGCATAATTTTTTGCTCCAAGAATACTTTTCAAAAAGTACTGAATGT 
AAAATCTAAATTAAAATATGTAGAAACTATTATTATATTAGACTTAAATGAAGACTTAGG 
AGGTTATCAATGCCTCAACAACTTTATTTCTCAAAATTCCGATAGTAATCTGGACGTAAA 
AAAATTTAAACCAAATTCTTTTAATCGAGACGATCAGGTTGCGTTGGTAATGTTTTCTTC 
TGGTACAACTGGTGTTCCGAAGGGAGTCATGCTAACTCACAAGAATATTGTTGCACGATT 
TTCTCTTGCAAAAGATCCTACTTTTGGTAACGCAATTAATCCAACGACAGCAATTTTAAC 
GGTAATACCTTTCCACCATGGTTTTGGTATGATGACCACATTAGGATACTTTACTTGTGG 
ATTCCGAGTTGTTCTAATGCACACGTTTGAAGAAAAACTATTTCTACAATCATTACAAGA 
TTATAAAGTGGAAAGTACTTTACTTGTACCAACATTAATGGCATTTCTTGCAAAAAGTGC 
ATTAGTTGAAAAGTACGATTTATCGCACTTAAAAGAAATTGCATCTGGTGGCGCACCTTT 
ATCAAAAGAAATTGGGGAGATGGTGAAAAAACGGTTTAAATTAAACTTTGTCAGGCAAGG 
GTATGGATTAACAGAAACCACTTCGGCTGTTTTAATTACACCGAAAxxxxxxGCCAGACC 
GGGATCAACTGGTAAAATAGTACCATTTCACGCTGTTAAAGTTGTCGATCCTACAACAGG 
AAAAATTTTGGGGCCAAATGAACCTGGAGAATTGTATTTTAAAGGCGCCATGATAATGAA 
GGGTTATTATAATAATGAAGAAGCTACTAAAGCAATTATTGATAAAGACGGATGGTTGCG 
CTCTGGTGATATTGCTTATTATGACAATGAT6GCCATTTTTATATTGTGGACAGGCTGAA 
GTCATTAATTAAATATAAAGGTTATCAGGTTGCACCTGCTGAAATTGAGGGAATACTCTT 
ACAACATCCGTATATTGTTGATGCCGGCGTTACTGGTATACCGGATGAAGCCGCGGGCGA 
GCTTCCAGCTGCAGGTGTTGTAGTACAGACTGGAAAATATCTAAACGAACAAATCGTACA 
AAATTTTGTTTCCAGTCAAGTTTCAACAGCCAAATGGCTACGTGGTGGGGTGAAATTTTT 
GGATGAAATTCCCAAAGGATCAACTGGAAAAATTGACAGAAAAGTGTTAAGACAAATGTT 
TGAAAAACACACCAATGGG* 
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FIGURE 36 



DPMADKNILYGPEPFYPLADGTAGEQMFDALSRYADISGCIALTNAHTKENVLYEEFLKL 

SCRLAESFKKYGLKQNDTIAVCSENGLQFFLPVIASLYLGIIAAPVSDKYIERELIHSLG 

IVKPRIIFCSKNTFQKVLNVKSKLKSVETIIILDLNEDLGGYQCLNNFISQNSDINLDVK 

KFKPYSFNRDDQVALVMFSSGTTGVSKGVMLTHKNIVARFSLAKDPTFGNAINPTTAILT 

V I PFHHG FGMMTTLG Y FTCGFRVVLMHT FEEKLFLQS LQDYKVESTLLVPTLMAFIAKSA 

LVEKYDLSHLKEIASGGAPLSKEIGEMVKKRFKLNFVRQGYGLTETTSAVLITPKXXARPG 

STGKIVPFHAVKWDPTTGKILGPNEPGELYFKGAMIMKGYYNNEEATKAIIDNDGWLRS 

GDIAYYDNDGHFYIVDRLKSLIKYKGYQVAPAEIEGILLQHPYIVDAGVTGIPDEAAGEL 

PAAGVWQTGKYLNEQIVQDFVSSQVSTAKWLRGGVKFLDEIPKGSTGKIDRKVLRQMFE 

KHTNG$ 
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FIGURE 37 



DPMADKNILYGPEPFYPLADGTAGEQMFYALSRYADISGCIALTNAHTKENVLyEEFLKL 
SCRLAESFKKYGLKQNDTIAVCSENGLQFFLPVIASLYLGIIAAPVSDKYIERELIHSLG 
IVKPRIIFCSKNTFQKVLNVKSKLKYVETIIILDLNEDLGGYQCLNNFISQNSDINLDVK 
KFKPYSFNRDDQVALVMFSSGTTGVPKGVMLTHKNIVARFSIAKDPTFGNAINPTTAILT 
VIPFHHGFGMMTTLGYFTCGFRWLMHTFEEKLFLQSLQDYKVESTLLVPTLMAFIlAKSA 

lvekydlshlkeiasggaplskeigemvkkrfklnfvrqgygltettsavlitpkxxvrpg 
stgkivpfhavkwdpttgkilgpnepgelyfkgdmimkgyynneeatkaiidkdgwlrs 
gdiayydndghfyivdrlkslikykgyqvapaeiegillqhpyivdagvtgipdeaagel 
PAAGVWQTGKYLNEQI VQN fvssqvstakwlrggvkfldei pkgstgki drkvlrqmfe 

KHTNG 
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FIGURE 38 



DPMADKN I LYGPE PFY P LADGTAGEQMFDALSRYAD I PGCI ALTNAHTKEN VLYEEFLKL 

SCRLAESFKKYGLKQNDTIAVCSENGLQYFLPVIASLYLGIIAAPVSDKYIERELIHSLG 

IVKPRIIFCSKNTFQKVLNVKSKLKYVETIIILDLNEDLGGYQCLNNFISQNSDINLDVK 

KFKPNS FNRDDQVALVM FS SGTTGVPKGVMLTHKN I VARFSIAKDPT FGN AI N PTTAILT 

VIPFHHGFGMMTTLGYFTCGFRVVLMHTFEEKLFLQSLQDYKVESTLLVPTLMAFIAKSA 

LVEKYDLSHLKEIASGGAPLSKEIGEMVKKRFKLNFVRQGYGLTETTSAVLITPKxxARPG 

STGKIVPFHAVKWDPTTGKILGPNEPGELYFKGAMIMKGYYNNEEATKAIIDKDGWLRS 

GDIAYYDNDGHFYIVDRLKSLIKYKGYQVAPAEIEGILLQHPYIVDAGVTGIPDEAAGEL 

PAAGWVQTGKYLNEQI VQNFVSSQVSTAKWLRGGVKFLDEI PKGSTGKI DRKVLRQMFE 

KHTHG 
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FIGURE 39 



DPMADKN I LYG PE P FYPLADGTAGEQMFDALSRYADI PGC I ALTN AHTKENVLYEE FLKL 

SCRLAESFKKYGLKQNDTIAVCSENGLQFFLPVIASLyLGIIAAPVSDKYVERELIHSLG 

IVKPRI IFCSKNTFQKVLNVKSKLKYVETI IILDLNEDLGGYQCLNNFISQNSDSNLDVK 

KFKPN S FNRDDQVALVMFSSGTTGVPKGVMLTHKNI VARFSIAKDPT FGNAINPTT AI LT 

VIPniHGFGMMTTUSYFTCGFRVNOMHTFEEKLFUJSLQDYKVESTLLVPTLMAFlAKSA 

LVEKYDLSHLKEIASGGAPLSKElGEMVKKRFKLNFVRQGYGLTETTSAVLITPKxxARPG 

STGKIVPFHAVKWDPTTGKILGPNEPGELYFKGAMIMKGYYNNEEATKAIIDKDGWLRS 

GDIAYYDNDGHFYIVDRLKSLIKYKGYQVAPAEIEGILLQHPYIVDAGVTGIPDEAAGEL 

PAAGVWQTGKYLNEQIVQNFVSSQVSTAKWLRGGVKFLDEIPKGSTGKIDRKVLRQMFE 

KHTNG 
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FIGURE 40 



GGATCCAATGGCAGATAAAAATATTTTATATGGGCCCGAACCATTTTATCCCTTGGCTGA 
TGGGACGGCTGGAGAACAGATGTTTGACGCATTATCTCGTTATGCAGATATTCCGGGCTG 
CATAGCATTGACAAATGCTCATACAAAAGAAAATGTTTTATATGAAGAGTTTTTAAAATT 
GTCGTGTCGTTTAGCGGAAAGTTTTAAAAAGTATGGATTAAAACAAAACGACACAATAGC 
GGTGTGTAGCGAAAATGGTTTGCAATTTTTCCTTCCTGTAATTGCATCATTGTATCTTGG 
AATAATTGTGGCACCTGTTAACGATAAATACATTGAACGTGAATTAATACACAGTCTTGG 
TATTGTAAAACCACGCATAGTTTTTTGCTCCAAGAATACTTTTCAAAAAGTACTGAATGT 
AAAATCTAAATTAAAATCTGTAGAAACTATTATTATATTAGACTTAAATGAAGACTTAGG 
AGGTTATCAATGCCTCAACAACTTTATTTCTCAAAATTCCGATATTAATCTTGACGTAAA 
AAAATTTAAACCATATTCTTTTAATCGAGACGATCAGGTTGCGTTGATTATGTTTTCTTC 
TGGTACAACTGGTCTGCCGAAGGGAGTCATGCTAACTCACAAGAATATTGTTGCACGATT 
TTCTCTTGCAAAAGATCCTACTTTTGGTAACGCAATTAATCCCACGACAGCAATTTTAAC 
GGTAATACCTTTCCACCATGGTTTTGGTATGATGACCACATTAGGATACTTTACTTGTGG 
ATTCCGAGTTGTTCTAATGCACACGTTTGAAGAAAAACTATTTCTACAATCATTACAAGA 
TTATAAAGTGGAAAGTACTTTACTTGTACCAACATTAATGGCATTTCTTGCAAAAAGTGC 
ATTAGTTGAAAAGTACGATTTATCGCACTTAAAAGAAATTGCATCTGGTGGCGCACCTTT 
ATCAAAAGAAATTGGGGAGATGGTGAAAAAACGGTTTAAATTAAACTTTGTCAGGCAAGG 
GTATGGATTAACAGAAACCACTTCGGCTGTTTTAATTACACCGAAAxxxxxxGCCAGACC 
GGGATCAACTGGTAAAATAGTACCATTTCACGCTGTTAAAGTTGTCGATCCTACAACAGG 
AAAAATTTTGGGGCCAAATGAACCTGGAGAATTGTATTTTAAAGGCCCGATGATAATGAA 
GGGTTATTATAATAATGAAGAAGCTACTAAAGCAATTATTGATAATGACGGATGGTTGCG 
CTCTGGTGATATTGCTTATTATGACAATGATGGCCATTTTTATATTGTGGACAGGCTGAA 
GTCATTAATTAAATATAAAGGTTATCAGGTTGCACCTGCTGAAATTGAGGGAATACTCTT 
ACAACATCCGTATATTGTTGATGCCGGCGTTACTGGTATTCCGGATGAAGCCGCGGGCGA 
GCTTCCAGCTGCAGGTGTTGTAGTACAGACTGGAAAATATCTAAACGAACAAATCGTACA 
AGATTTTGTTTCCAGTCAAGTTTCAACAGCCAAATGGCTACGTGGTGGGGTGAAATTTTT 
GGATGAAATTCCCAAAGGATCAACTGGAAAAATTGACAGAAAAGTGTTAAGACAAATGTT 
TGAAAAACACACCAATGGG 
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FIGURE 41 



DPMADKN I LYG PE P FY PLADGTAGEQM FDALSRYADI PGC I ALTNAHTKENV LYEE FLKL 

SCRIAESFKKYGLKQNDTIAVCSENGLQFFLPVIASLYLGIIVAPVNDKYIERELIHSLG 

IVKPRIVFCSKNTFQKVLNVKSKLKSVETIIILDLNEDLGGYQCLNNFISQNSDINLDVK 

KFKPTSFNRDDQVALIMFSSGTTGLPKGVMLTHKNIVARFSIAKDPTFGNAINPTTAILT 

VIPFHHGFGMMTTLGYFTCGn^VVIi4HTFEEKLFI^SLQDYKVESTLLVPTLMAFLAKSA 

LVEKYDLSHLKEIASGGAFLSKEIGEMVKKRFKLNFVRQGYGLTETTSAVLITPKxxARPG 

STGKIVPFHAVKWDPTTGKILGPNEPGELYFKGPMIMKGYYNNEEATKAIIDMDGWLRS 

GDIAYYDNDGHFYIVDRLKSLIKYKGYQVAPAEIEGILLQHPYIVDAGVTGIPDEAAGEL 

PAAG\AA;QTGKY LNEQI VQD FVSS QVST AKWLRGGVKFLDEI PKGSTGKI DRKVLRQMFE 

KHTHG 
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FIGURE 42 



GGATCCAATGGCAGATAAGAATATTTTATATGGGCCCGAACCATTTTATCCCTTGGAAGA 
TGGGACGGCTGGAGAACAGATGTTTCACGCATTATCTCGTTATGCAGATATTCCGGGCTG 
CATAGCATTGACAAATGCTCATACAAAAGAAAATGTTTTATATGAAGAGTTTCTGAAACT 
GTCGTGTCGTTTAGCGGAAAGTTTTAAAAAGTATGGATTAAAACAAAACGACACAATAGC 
GGTGTGTAGCGAAAATGGTCTGCAATTTTTCCTTCCTGTAATTGCATCATTGTATCTTGG 
AATAATTGTGGCACCTGTTAACGATAAATACATTGAACGTGAATTAATACACAGTCTTGG 
TATTGTAAAACCACGCATAATTTTTTGCTCCAAGAATACTTTTCAAAAAGTACTGAATGT 
AAAATCTAAATTAAAATCTGTAGAAACTATTATTATATTAGACTTAAATGAAGACTTAGG 
AGGTTATCAATGCCTCAACAACTTTATTTCTCAAAATTCCGATATTAATCTTGACGTAAA 
AAAATTTAAACCATATTCTTTTAATCGAGACGATCAGGTTGCGTTGTTAATGTTTTCTTC 
TGGTACAACTGGTCTGCCGAAGGGAGTCATGCTAACTCACAAGAATATTGTTGCACGATT 
TTCTCTTGCaAAAGATCCTACTTTTGGTAACGCAATTAATCCCACGACAGCAATTTTAAC 
GGTAATACCTTTCCACCATGGTTTTGGTATGATGACCACATTAGGATACTTTACTTGTGG 
ATTCCGAGTTGTTCTAATGCACACGTTTGAAGAAAAACTATTTCTACAATCATTACAAGA 
TTATAAAGTGGAAAGTACTTTACTTGTACCAACATTAATGGCATTTCTTGCAAAAAGTGC 
ATTAGTTGAAAAGTACGATTTATCGCACTTAAAAGAAATTGCATCTGGTGGCGCACCTTT 
ATCAAAAGAAATTGGGGAGATGGTGAAAAAACGGTTTAAATTAAACTTTGTCAGGCAAGG 
GTATGGATTAACAGAAACCACTTCGGCTGTTTTAATTACACCGAAAxxxxxxGCCAAACC 
GGGATCAACTGGTAAAATAGTACCATTTCACGCTGTTAAAGTTGTCGATCCTACAACAGG 
AAAAATTTTGGGGCCAAATGAACCTGGAGAATTGTATTTTAAAGGCCCGATGATAATGAA 
GGGTTATTATAATAATGAAGAAGCTACTAAAGCAATTATTGATAATGACGGATGGTTGCG 
CTCTGGTGATATTGCTTATTATGACAATGATGGCCATTTTTATATTGTGGACAGGCTGAA 
GTCACTGATTAAATATAAAGGTTATCAGGTTGCACCTGCTGAAATTGAGGGAATACTCTT 
ACAACATCCGTATATTGTTGATGCCGGCGTTACTGGTATTCCGGATGAAGCCGCGGGCGA 
GCTTCCAGCTGCAGGTGTTGTAGTACAGACTGGAAAATATCTAAACGAACAAATCGTACA 
AGATTATGTTGCCAGTCAAGTTTCAACAGCCAAATGGCTACGTGGTGGGGTGAAATTTTT 
GGATGAAATTCCCAAAGGATCAACTGGAAAAATTGACAGAAAAGTGTTAAGACAAATGTT 
TGAAAAACACACCAATGGG 



wo 99/14336 PCT/US98/19494 



/ 



49. / 59 



FIGURE 43 



DPMADKNILYGPEPFYPLEDGTAGEQMFDALSRYADIPGCIALTNAHTKENVLYEEFLKL 

SCRLAESFKKYGLKQNDTIAVCSENGLQFFLPVIASLYLGirVAPVNDKYIERELIHSLG 

IVKPRIIFCSKNTFQKVLNVKSKLKSVETIIILDLNEDLGGYQCLNNFISQNSDINLDVK 

KFKPTSFNRDDQVALLMFSSGTTGLPKGVMLTHKNIVARFSLAKDPTFGNAINPTTAILT 

VIPFHHGFGMMTTLGYFTCGFRV\rLMHTFEEKLFLQSU?DYKVESTLLVPTLMAFLAKSA 

LVEKYDLSHLKEIASGGAPLSKEIGEMVKKRFKLNFVROGYGLTETTSAVLITPKxxAKPG 

STGKIVPFHAVKWDPTTGKILGPNEPGELYFKGPMIMKGYYNNEEATKAIIDNDGWLRS 

GDIAYYDNDGHFYIVDRLKSLIKYKGYQVAPAEIEGILLQHPYIVDAGVTGIPDEAAGEL 

PAAGVWQTGKYLNEQIVQDYVASQVSTAKWLRGGVKFLDEIPKGSTGKIDRKVLRQMFE 

KHTNG 
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FIGURE 44 



GGATCCAATGGAAGATAAAAATATTTTATATGGACCTGAACCATTTTATCCCTTGGCTGATGaSACGGCT^ 

ATGTTTTACGCATTATCTCGTTATGCAGATATTTCAGGATGCATAGCATTGACAAATGCTCATACAAAAGAAAATGTTT 

TATATGAAGAGTTTTTAAAATTGTCGTGTCGTTTAGCGGAAAGTTTTAAAAAGTATGGATTAAAACAAAACGACACAAT 

AGCGGTGTGTAGCGAAAATGGTTTGCAATTTTTCCTTCCTTTAATTGCATCATTGTATCTTGGAATAATTGCAGCACCT 

GTTAGTGATAAATACATTGAACGTGAATTAATACACAGTCTTGGTATTGTAAAACCACGCATAATTTTTTCT 

ATACTTTTCAAAAAGTACTGAATGTAAAATCTAAATTAAAATATGTAGAAACTATTATTATATTAGACTTAAATGAAGA 

CTTAGGAGGTTATCAATGCCTCAACAACTTTATTTCTCAAAATTCCGATATTAATCTTGACGTAAAAAAATTTAAA 

AATTCTTTTAATCGAGACGATCAGGTTGCGTTGGTAATGTTTTCTTCTGGTACAACTGGTGTTTCGAAGGGAGTC^^ 

TAACTCACAAGAATATTGTTGCACGATTTTCTCATTGCAAAGATCCTACTTTTGGTAACGCAATTAATCCAAC^^ 

AATTTTAACGGTAATACCTTTCCACCATGGTTTTGGTATGATGACCACATTAGGATACTTTACTTGTGGATTCC^^ 

GCTCTAATGCACACGTTTGAAGAAAAACTATTTCTACAATCATTACAAGATTATAAAGTGGAAAGTACTTT^ 

aUVCATTAATGGCATTTTTTGCAAAAAGTGCATTAGTTGAAAAGTACGATTTATCGCACTTAAAAGAAATTG^^ 

TGGCGCACCTTTATCAAAAGAAATTGGGGAGATGGTGAAAAAACGGTTTAAATTAAACTTTGTCAGGCAAGGGTATGGA 

TTAACAGAAACCACTTCGGCTGTTTTAATTACACCGGACACTGACGTCAGACCGGGATCAACTGGTAAAATAGTACCAT 

TTCACGCTGTTAAAGTTGTCGATCCTACAACAGGAAAAATTTTGGGGCCAAATGAAACTGGAGAATTGTATTTTAAAGG 

CGACATGATAATGAAAAGTTATTATAATAATGAAGAAGCTACTAAAGCAATTATTAACAAAGACGGATGGTTGCGCT 

GGTGATATTGCTTATTATGACAATGATGGCCATTTTTATATTGTGGACAGGCTGAAGTCATTAATTAAATATAAAGGTT 

ATCAGGTTGCACCTGCTGAAATTGAGGGAATACTCTTACAACATCCGTATATTGTTGATGCCGGCGTTACTGGTATACC 

GGATGAAGCCGCGGGCGAGCTTCCAGCTGCAGGTGTTGTAGTACAGACTGGAAAATATCTAAACGAACAAATCGTACAA 

AATTTTGTTTCCAGTCAAGTTTCAACAGCCAAATGGCTACGTGGTGGGGTGAAATTTTTGGATGAAATTCCCAAAGGAT 

CAACTGGAAAAATTGACAGAAAAGTGTTAAGACAAATGTTTGAAAAACACAAATCTAAGCTG 
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FIGURE 45 



DPMEDKNILYGPEPFYPLADGTAGEQMFYALSRYADISGCIALTNAHTKENVLYEEFLKL 
SCRLAES FKKYGLKQNDTIAVCSENGLQFFLPLIASLYLGI lAAPVSDKYIERELIHSLG 
IVKPRIIFCSKNTFQKVLNVKSKLKYVETIIILDLNEDLGGYQCLNNFISQNSDINLDVK 
KFKPNSFNRDDQVALVMFSSGTTGVSKGVMLTHKNIVARFSHCKDPTFGNAINPTTAILT 
VIPFHHGFGMMTTLGYFTCGFRVALMHTFEEKLFLQSLQDYKVESTLLVPTLMAFFAKSA 
LVEKYDLSHLKEIASGGAPLSKEIGEMVKKRFKLNFVRQGYGLTETTSAVLITPDTDVRP 
GSTGKIVPFHAVKWDPTTGKILGPNETGELYFKGDMIMKSYYNNEEATKAIINKDGWLR 
SGDIAYYDNDGHFYIVDRLKSLIKYKGYQVAPAEIEGILLQHPYIVDAGVTGIPDEAAGE 
LPAAGVWQTGKYLNEQIVQNFVSSQVSTAKWLRGGVKFLDEIPKGSTGKIDRKVLRQMF 
EKHKSKL 
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FIGURE 46 



DPMMKREKNVIYGPEPLHPLEDLTAGEMLFRALRKHSHLPQALVDWGDESLSYKEFFEA 
TVLLAQSLHNCGYKMNDWSICAENNTRFFIPVIAAWYIGMIVAPVNESYIPDELCKVMG 
ISKPQIVFTTKNILNKVLEVQSRTNFIKRIIILDTVENIHGCESLPNFISRYSDGNIANF 
KPLHFDPVEQVAAILCSSGTTGLPKGVMQTHQNICVRLIHALDPRAGTQLIPGVTVLVYL 
PFFHAFGFSITLGYFMVGLRVIMFRRFDQEAFLKAIQDYEVRSVINVPSVILFLSKSPLV 
DKYDLSSLRELCCGAAPLAKEVAEVAAKRLNLPGIRCGFGLTESTSANIHSLRDEFKSGS 
LGRVTPLMAAKIADRETGKALGPNQVGELCIKGPMVSKGYVKNVEATKEAIDDDGWLHSG 
DFGYYDEDEHFYWDRYKELIKYKGSQVAPAELEEILLKNPCIRDVAWGIPDLEAGELP 
SAFWKQPGKEITAKEVYDYLAERVSHTKYLRGGVRFVDSIPRNVTGKITRKELLKQLLE 

KAGG 
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FIGURE 47 



GGATCCCATGATGAAGCGAGAGAAAAATGTTATATATGGACCCGAACCCCTACACCCCTT 
GGAAGACTTAACAGCTGGAGAAATGCTCTTCCGTGCCCTTCGAAAACATTCTCATTTACC 
GCAGGCTTTAGTAGATGTGGTTGGCGACGAATCGCTTTCCTATAAAGAGTTTTTTGAAGC 
GACAGTCCTCCTAGCGCAAAGTCTCCACAATTGTGGATACAAGATGAATGATGTAGTGTC 
GATCTGCGCCGAGAATAATACAAGATTTTTTATTCCCGTTATTGCAGCTTGGTATATTGG 
TATGATTGTAGCACCTGTTAATGAAAGTTACATCCCAGATGAACTCTGTAAGGTGATGGG 
TATATCGAAACCACAAATAGTTTTTACGACAAAGAACATTTTAAATAAGGTATTGGAGGT 
ACAGAGCAGAACTAATTTCATAAAAAGGATCATCATACTTGATACTGTAGAAAACATACA 
CGGTTGTGAAAGTCTTCCCAATTTTATTTCTCGTTATTCGGATGGAAATATTGCCAACTT 
CAAACCTTTACATTTCGATCCTGTTGAGCAAGTGGCAGCTATCTTATGTTCGTCAGGCAC 
TACTGGATTACCGAAAGGTGTAATGCAAACTCACCAAAATATTTGTGTCCGACTTATACA 
TGCTTTAGACCCCAGGGCAGGAACGCAACTTATTCCTGGTGTGACAGTCTTAGTATATCT 
GCCTTTTTTCCATGCTTTTGGGTTCTCTATAACCTTGGGATACTTCATGGTGGGTCTTCG 
TGTTATCATGTTCAGACGATTTGATCAAGAAGCATTTCTAAAAGCTATTCAGGATTATGA 
AGTTCGAAGTGTAATTAACGTTCCATCAGTAATATTGTTCTTATCGAAAAGTCCTTTGGT 
TGACAAATACGATTTATCAAGTTTAAGGGAATTGTGTTGCGGTGCGGCACCATTAGCAAA 
AGAAGTTGCTGAGGTTGCAGCAAAACGATTAAACTTGCCAGGAATTCGCTGTGGATTTGG 
TTTGACAGAATCTACTTCAGCTAATATACACAGTCTTAGGGATGAATTTAAATCAGGATC 
ACTTGGAAGAGTTACTCCTTTAATGGCAGCTAAAATAGCAGATAGGGAAACTGGTAAAGC 
ATTGGGACCAAATCAAGTTGGTGAATTATGCATTAAAGGTCCCATGGTATCGAAAGGTTA 
CGTGAACAATGTAGAAGCTACCAAAGAAGCTATTGATGATGATGGTTGGCTTCACTCTGG 
AGACTTTGGATACTATGATGAGGATGAGCATTTCTATGTGGTGGACCGTTACAAGGAATT 
GATTAAATATAAGGGCTCTCAGGTAGCACCTGCAGAACTAGAAGAGATTTTATTGAAAAA 
TCCATGTATCAGAGATGTTGCTGTGGTTGGTATTCCTGATCTAGAAGCTGGAGAACTGCC 
ATCTGCGTTTGTGGTTAAACAGCCCGGAAAGGAGATTACAGCTAAAGAAGTGTACGATTA 
TCTTGCCGAGAGGGTCTCCCATACAAAGTATTTGCGTGGAGGGGTTCGATTCGTTGATAG 
CATACCAAGGAATGTTACAGGTAAAATTACAAGAAAGGAACTTCTGAAGCAGTTGCTGGA 

GAAGGCGGGAGGT 
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The great majority of viral mRNAs in mouse C127 cells transformed by bovine papillomavirus type 1 (BPV) 
have a common 3' end at the early polyadenylation site which is 23 nucleotides (nt) downstream of a canonical 
poly(A) consensus signal. Twenty percent of BPV mRNA from productively infected cells bypasses the early 
polyadenylation site and uses the late polyadenylation site approximately 3,000 nt downstream. To inactivate 
the BPV early polyadenylation site, the early poly(A) consensus signal was mutated from AAUAAA to 
UGUAAA. Surprisingly, this mutation did not result in significant read-through expression of downstream 
RNA. Rather, RNA mapping and cDNA cloning experiments demonstrate that virtually all of the mutant RNA 
is cleaved and polyadenylated at heterogeneous sites approximately 100 nt upstream of the wild-type early 
polyadenylation site. In addition, cells transformed by wild-type BPV harbor a small population of mRNAs with 
3' ends located in this upstream region. These experiments demonstrate that inactivation of the major poly(A) 
signal induces preferential use of otherwise very minor upstream poly(A) sites. Mutational analysis suggests 
that polyadenylation at the minor sites is controlled, at least in part, by UAUAUA, an unusual variant of the 
poly(A) consensus signal approximately 25 nt upstream of the minor polyadenylation sites. These experiments 
indicate that inactivation of the major early polyadenylation signal is not sufficient to induce expression of the 
BPV late genes in transformed mouse cells. 



Eukaryotic RNA polymerase II transcription units are typi- 
cally transcribed past the mature mRNA 3' end. These tran- 
scripts are then cleaved and a poly(A) tract of 2(K) to 300 
nucleotides (nt) is added to generate the 3' end of the mature 
mRNA (for reviews, see references 30 and 37). Eighty to 
ninety percent of animal cell mRNAs contain the sequence 
AAUAAA 10 to 30 nt upstream of the poly(A) tail. Another 
10% have the variant AUUAAA; other variants arc rare (38). 
These consensus sequences have been shown to be required 
for efficient and accurate cleavage and polyadenylation both in 
vivo and in vitro (17, 27, 29). Generally, when this sequence is 
mutated, polyadenylation occurs at a downstream site, often 
with reduced efficiency (17). The region upstream and down- 
stream of the AAUAAA consensus signal, including GU-rich 
downstream sequences, has also been identified as playing a 
role in the cleavage and polyadenylation of some transcripts 
(30, 37). 

Bovine papillomavirus type 1 (BPV) induces fibropapillomas 
in cattle and transforms a number of cultured rodent fibroblast 
cell lines to tumorigenicity. The papillomaviruses arc unable to 
propagate in such transformed cells, in part because the early 
polyadenylation site used by essentially all BPV transcripts in 
transformed cells is located between the transcriptional pro- 
moters and LI and L2, the two genes which encode the virion 
proteins (Fig. lA) (16, 23, 39). Similarly, in BPV-induced skin 
fibropapillomas, usage of this early polyadenylation site pre- 
cludes expression of the capsid protein genes in transformed 
dermal fibroblasts and presumably in the basal keratinocytes as 
well (4, 5, 35). In terminally differentiating keratinocytes which 
express the capsid proteins and produce virus, about 20% of 
the viral mRNA reads through the early polyadenylation site 
and is instead polyadenylated approximately 3,000 nt down- 
stream at the late polyadenylation site (5). Thus, regulation of 



polyadenylation at the early site appears to be crucial kn viral 
late gene expression. 

To study signals that control polyadenylation in BPV- trans- 
formed mouse CI 27 cells, we mutated the early poly(A) 
consensus signal AAUAAA, located 23 nt upstream of the 
early poly(A) site. It was expected that mutant transcripts 
would now bypass the early poly(A) site and that late region 
sequences would be included in stable RNA. RNA mapping 
experiments instead demonstrated that mutant transcripts 
were polyadenylated at heterogeneous sites approximately 100 
nt upstream of the early polyadenylation site u.scd in cells 
transformed by wild-type BPV. Evidence is presented which 
suggests that an unusual variant of the poly(A) consensus 
sequence, UAUAUA, plays a role in the regulation of poly- 
adenylation at the upstream polyadenylation sites. 

Construction and preliminary characterization of the 
poly(A) consensus mutant. To disrupt polyadenylation at the 
BPV major early polyadenylation site at nt 4203, oligonucle- 
otidc-directed mutagenesis was used to mutate the poly(A) 
consensus signal at nt 4180 from AAUAAA to UGUAAA 
(Fig. IB), thereby creating a new PvttW cleavage site (23a). The 
resulting mutation on a BstX\-io-Sal\ fragment was recon- 
structed into the full-length wild-type BPV genome (clone 
pBPV-142-6 [33]) to generate mutant pBPV-EPAI. Nucle- 
otide sequence analysis of the fragment replaced in generating 
pBPV-EPAl (nt 3849 and 4450) demonstrated that no extra- 
neous mutations were introduced during mutagenesis. 

The ability of three isolates of pBPV-EPAl to transform 
CI 27 cells was assayed by determining the efficiency of focus 
formation after BamHl digestion to release the viral DNA 
from the plasmid vector and transfcction as described previ- 
ously (14). All three mutant isolates transformed cells with 
approximately the same efficiency as wild-type BPV DNA 
(data not shown). Cell lines were derived from pools of foci 
induced by pBPV-EPAl (EPAlp) and by wild-type BPV DNA 
(142-6p). ID 13 cells, a Ci27 cell line transformed by infection 
with BPV, were used as an additional wild-type control. 
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FIG. 1. The BPV genome and design of the EPAl mutation. (A) 
The BPV genome linearized at the 3' end of the late transcription unit 
is shown. Open boxes indicate translational open reading frames. The 
long horizontal arrow indicates the direction of transcription. The 
short horizontal arrows indicate the positions of major promoters. The 
late promoter is designated Pl. Ag and Al denote the early and late 
polyadenylation sites, respectively. B indicates the position of the 
unique BamHl site. Nucleotide numbers are shown at the bottom. (B) 
The EPAl mutation. The top line shows the wild-type BPV DNA 
sequence around the early polyadenylation signal. The sequence of the 
mutagenic oligonucleotide LI is shown directly below it, with the base 
substitutions shown in boldface. The template was the small BamHl to 
EcoRl fragment of BPV-1 DNA cloned in M13mp8 (13). The se- 
quence at the bottom shows the mutation with the new Pvull site 
indicated. The open reading frame L2 initiation codon is designated 
the MET codon. 



Southern blot analysis of viral DNA from transformed cells 
demonstrated that the mutant viral DNA was maintained in 
transformed cells as a multicopy plasmid without gross rear- 
rangement and with restoration of the BamHl site used to 
excise the viral DNA from the plasmid vector (data not 
shown). 

Mapping the 3' end of the mutant mRNA. The mutation in 
pBPV-EPAl was designed to eliminate polyadenylation at the 
wild-type early polyadenylation site immediately upstream of 
the late open reading frames. Extensive Northern (RNA) blot 
analysis and RNA protection experiments failed to detect 
significant amounts of RNA extending past the polyadenyla- 
tion site into the late region, but these experiments did not 
exclude the presence of low levels of read-through RNA (data 
not shown). There was severalfold more stable viral RNA in 
cells transformed by wild-type BPV than in those transformed 
by the polyadenylation site mutant (Fig. 2). 

RNase protection experiments were performed to map the 
3' ends of the mutant transcripts. ID13 and EPAlp RNAs were 
assayed for protection of an antisense EPAl RNA probe 
spanning the early polyadenylation site at nt 4203 (Fig. 2, left 
panel). The size of the fragment protected by RNA from ID 13 
cells indicates that, as expected, the wild-type viral RNA 
extends past nt 4180, the site of the mutation in the probe (lane 
c). In contrast, EPAlp RNA protected several fragments 
approximately 100 nt shorter than those protected by wild-type 
RNA, suggesting that the mutant RNA is polyadenylated 
upstream of the normal position (lanes a and b). The differ- 
ence in the pattern of protected bands beitween the two EPAlp 



lanes, a and b, is due to the different cleavage specificities of 
the two RNases used in these reactions. The same result was 
obtained with oligo(dT)-selected EPAlp RNA (data not 
shown), indicating that these shorter species are polyadenyl- 
ated, a conclusion confirmed by cDNA cloning (see below). 
There was no evidence of significant polyadenylation of 
EPAlp RNA at the usual position, nor were prominent shorter 
novel bands protected in the 1D13 sample. RNA from two 
additional cell lines generated with the original isolate of the 
mutant and two additional cell lines generated with indepen- 
dent isolates of the mutant showed the protection pattern 
characteristic of the mutant (data not shown). These results 
suggested that sequences downstream of nt 4100 were absent 
from mutant RNA, an interpretation supported by the results 
of protection experiments with additional antisense probes and 
the results of Northern blot hybridization experiments with 
oligonucleotide probes (data not shown). These results are 
interpreted in the right panel of Fig. 2. 

cDNA cloning and sequencing. The results presented above 
suggest that new heterogeneous polyadenylation sites near nt 
41(X) are utilized in EPAlp RNA. To confirm this interpreta- 
tion, the 3' ends of both wild-type and mutant RNAs were 
cloned and sequenced. 01igod(T)-selected (3) 142-6p and 
EPAlp RNAs were reverse transcribed with oligo(dT) as 
primer and the reagents and protocol of a cDNA synthesis kit 
(Amersham). The resulting first-strand cDNAs were amplified 
by the polymerase chain reaction (PCR) method with the 
primers diagrammed in Fig. 3 A (18, 32, 34). To specifically 
amplify BPV sequences, the upstream PCR primer PCR5 
corresponded to BPV nt 3998 to 4031. To selectively amplify 
polyadenylated molecules, the downstream PCR primer PCRT 
was 5' d(GGGGATCCT25) 3', which hybridized to any prod- 
uct containing a poly(A) tract. Annealing was carried out at 
25X, because PCRT has a calculated T„ of 38.9'*C in PCR 
buffer conditions (32). The products of each amplification 
reaction were cloned into pUC18, and colonies containing an 
insert were identified by colony hybridization (22) with an 
oligonucleotide probe PCRl complementary to a region (nt 
4063 to 4089) between the upstream primer and the proposed 
3' end of mutant RNA. 

The results of sequence analysis of the cDNA clones arc 
summarized in Fig. 3B. Sites of polyadenylation were identified 
as junctions between BPV DNA sequence and tracts of 
poly(A). Six of the 11 clones derived from cells transformed by 
wild-type BPV were polyadenylated after nt 4203, the previ- 
ously described early polyadenylation site (39), thus validating 
this strategy of identifying polyadenylation sites. In contrast, 
none of the clones derived from mutant RNA displayed the 
wild-type polyadenylation site. Instead, seven of the nine 
EPAlp clones contain a stretch of poly(A) immediately after 
BPV nt 4107, and the other two clones contain poIy(A) after nt 
4101 and 4092. These results are consistent with the RNase 
protection and Northern blot results which indicate the exis- 
tence of heterogeneous 3' ends near nt 41(K) in mutant RNA 
and demonstrate that these new 3' ends are in fact new sites of 
polyadenylation. Interestingly, the anomalous clones (almost 
half) derived from wild-type RNA showed polyadenylation at 
heterogeneous sites similar to those found with the mutant 
RNA. These results indicate that there is a population of 
mRNAs with heterogeneous polyadenylation sites around nt 
4100 in cells transformed by wild-type BPV. The preferential 
amplification of shorter PCR products may explain the rela- 
tively frequent isolation of these shorter cDNAs from cells 
transformed by wild-type BPV. We have occasionally observed 
faint bands in protection experiments with ID13 RNA which 
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FIG. 2. RNase protection analysis of viral early region RNA in transformed cells. (Left panel) Ten micrograms of total cellular RNA (9, 21) 
from EPAlp (lanes a and b), ID13 (lane c), and C127 (lane d) cells was hybridized to an antisense EPAl RNA probe (28) complementary to BPV 
nt 3912 to 4450, spanning the position of the normal early polyadenylation site but containing the mutation at the polyadenylation signal. Hybrids 
were digested with either 2 jig of RNasc Tl per ml (lane a), 40 jxg of RNase A per ml (lane b), or a mixture of both RNascs (lanes c to c), and 
protected fragments were detected by autoradiography after electrophoresis through a 4% polyacrylamide-50% urea gel. The sample in lane e was 
the probe digested after mock hybridization; the sample in lane f is undigested probe. The nucleotide lengths of size markers in lane g are indicated. 
The arrowhead indicates the predicted position of a 270-nt fragment extending from the 3' end of the probe to the site of the mutation, which is 
generated by cleavage at the mismatch between the wild-type RNA and the mutation in the probe. The vertical line on the left indicates the small 
cluster of bands protected by mutant RNA. (Right panel) Schematic representation of the probe, protected fragments generated by RNa.se 
digestion, and the deduced structure of viral RNA species. Arrows indicate the direction of transcription, with the arrowheads representing the 
3' end of each transcript. The X indicates the position of the mutation in the probe. 



are consistent with minor sites of polyadenylation at these 
upstream positions (data not shown). 

Identification of a signal controlling polyadenylation at the 
upstream sites. There is no poly(A) consensus sequence or 
previously described functional variant within 100 bp upstream 
of nt 4100. However, the sequence UAUAUA is present at nt 
4073, approximately 30 nt 5' to the poly(A) sites in EPAlp 
RNA (Fig. 3B). It is the closest match to the consensus 
sequence in the region, and it appears to be in the appropriate 
position to specify cleavage at the sites detected in mutant 
RNA. To test the role of this sequence in specifying polyade- 
nylation in the absence of the wild-type signal, it was mutated 
from UAUAUA to GAUAUC by using the mutagenic primer 
5' d(AA(rrTCATAC AGGATATCAA ACAAATCA)3', cor- 
responding to BPV sequence from nt 4063 to 4090, and 
single-stranded EPAl DNA as a template. The resulting 
mutant, pBPV-EPA2 (see Fig. 5) therefore contained both the 
original mutation at the poly(A) consensus signal and the new 
mutations in the putative variant signal. This mutant trans- 
formed CI 27 cells with approximately wild-type efficiency, and 
RNA from a pooled cell line transformed by EPA2 DNA was 
mapped by using RNase protection and an antisense EPA2 
probe (Fig. 4). RNA from ID13 cells protected the fragment 
sizes predicted if cleavage occurred at the sites of mismatch 



between wild-type RNA and the probe (which contains muta- 
tions at nt 4073, 4078, 4180, and 4181) (lane b). EPA2 RNA 
protected two major size classes of fragments (lane a). One was 
a set of probe fragments approximately 190 to 200 nt long, 
corresponding to polyadenylation near nt 4100 as in EPAlp 
RNA. These protected fragments comigrate with the frag- 
ments protected by EPAl RNA (data not shown) and are the 
size predicted if polyadenylation occurred at nt 4107, the 
mutant site mapped by cDNA cloning. In addition, EPA2p 
RNA protected several longer fragments corresponding to 
heterogeneous RNA 3' ends between nt 4200 and 4450. The 
EPA2 mutation thus reduced the efficiency with which the 
upstream polyadenylation sites arc used, but it did not appear 
to affect the position of poly(A) addition for those transcripts 
that are successfully polyadenylated in this region. These 
results indicate that the UAUAUA plays a role in specifying 
the new upstream sites of polyadenylation in EPAlp RNA. 

Discussion. These experiments were designed to study poly- 
adenylation site usage in BPV-transformed mouse cells. A 
point mutation in the poIy(A) consensus signal disrupted 
polyadenylation at that site both in vivo, as demonstrated here, 
and in an in vitro polyadenylation system (24). RNasc protec- 
tion. Northern blotting, and cDNA cloning and sequencing 
established that stable mutant transcripts utihzed heteroge- 
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FIG. 3. (A) PCR-based strategy to clone the 3' ends of viral RNA. 
The top portion of the panel shows the deduced structure of the 3' 
ends of the major viral RNAs. The upstream primer, PCR5, is 
complementary to all known wild-type and mutant viral early RNAs. 
The downstream primer, PCRT, consists of oligo(dT) and a cloning 
site but no BPV-specific sequences. After amplification with cDNA 
reverse transcribed from polyadenylated RNA as a template, BPV 
cDNAs were cloned into pUC18, identified by hybridization to PCRl, 
and sequenced. (B) cDNA clone sequences. The sense strand BPV 
sequence from nt 4011 to 4210 is shown. The normal poly(A) 
consensus signal is enclosed in the solid line, and the putative upstream 
poly(A) signal is enclosed in the dashed line. Each dot indicates the 
position of a junction between BPV DNA and the poIy(A) tract in a 
cDNA clone. Clones derived by amplification of wild-type RNA are 
represented by dots below the sequence, and those derived from 
mutant RNA are represented by dots above the sequence. 



neous polyadenylation sites approximately 100 nt upstream of 
the wild-type polyadenylation site. The results of the cDNA 
cloning also demonstrated that some wild-type transcripts have 
heterogeneous 3' ends in the region used by the mutant RNAs, 
indicating that this is a minor polyadenylation site in cells 
transformed by wild-type BPV. In a related system, Doniger et 
al. (15) found usage of upstream polyadenylation sites by 
human papillomavirus type 16 transcripts in an immortalized 
human exocervical epithelial cell line harboring a human 
papillomavirus type 16 genome with an extensive deletion 
immediately downstream of a wild-type early poly(A) signal. 
The 3' ends of viral RNA from these cells mapped to both the 
normal site and to a heterogeneous region 400 to 500 nt 
upstream of that site. 

The results described here suggest a hierarchy of polyade- 
nylation site usage in BPV-transformed cells, as is summarized 
in Fig. 5. Wild-type BPV mRNA is polyadenylated at the major 
polyadenylation site at nt 4203, with a small fraction of 
transcripts being polyadenylated at minor upstream sites 
around nt 4100. When the major signal is disrupted (as in 
EPAl), the sites around nt 4100 become the predominant sites 
of polyadenylation. When the major signal is inactivated and 
the minor signal is partially disrupted (as in EPA2), both the 



upstream sites and new downstream sites between nt 4200 and 
4450 are used. Additional experiments have shown that poly- 
adenylation occurs exclusively at these downstream sites when 
the major signal is inactivated and the upstream polyadenyla- 
tion region is deleted (2). There are several potential poly- 
adenylation signals in this downstream region, including a 
sequence at nt 4304 that deviates by 1 nt from the consensus 
polyadenylation signal. In addition, Burnett et al. (8) observed 
polyadenylation near nt 4450 in RNA from cells transformed 
by a spontaneous BPV-1 deletion mutant lacking the major 
poIy(A) site and surrounding sequences. One can speculate 
that the function of the multiple potential early polyadenyla- 
tion sites in BPV is to ensure that late genes are not expressed 
under inappropriate conditions, for example in transformed 
dermal fibroblasts or basal epidermal kcratinocytcs. 

Polyadenylation site selection appears to be a complex 
process that takes into account both the relative strengths of 
potential sites and their positions relative to one another (12, 
20). Moreover, the representation of polyadenylation sites in 
stable RNA reflects a number of factors in addition to poly- 
adenylation site selection, including the stability of various 
RNA species. The results of the RNase protection experiments 
reported here indicate that the upstream polyadenylation sites 
are used far more abundantly by the early polyadenylation 
signal mutant than by the wild type. However, it is also clear 
that there is less total viral RNA in cells transformed by the 
mutant. It is possible that processing at the upstream sites 
remains relatively inefficient even with the mutant polyadenyl- 
ation signal, resulting in the synthesis of a rapidly degraded 
pool of unprocessed RNA extending into the late region. In 
fact, Furth and Baker (19) have described a sequence element 
in the BPV late region which prevents the accumulation of 
stable viral RNA in transformed cells. 

The closest match to a poly(A) consensus signal in the 
vicinity of the minor upstream polyadenylation sites is 
UAUAUA, approximately 25 nt upstream of the new RNA 3' 
ends. RNA from cells containing mutations of both the original 
poly(A) signal and this putative upstream signal contains 
heterogeneous 3' ends at both the upstream sites and at 
additional positions downstream of the normal site. This result 
suggests that the UAUAUA plays a role in directing polyade- 
nylation at the upstream polyadenylation sites and that the 
mutation did not fully disrupt the function of the UAUAUA 
sequence. We are not aware of a precedent for UAUAUA 
acting as a poly(A) signal in mammalian cells, although it can 
direct mRNA 3' end formation and polyadenylation in Sac- 
charomyes cerevisiae (31). However, we note that the region 
around the upstream cleavage sites contains numerous oli- 
go(dT) tracks and GT dinucteotides, sequence motifs found 
near some bona fide mammalian poly(A) signals. 

The wild-type poly(A) consensus signal appears to suppress 
utilization of the variant signal located approximately 100 nt 
upstream. Such suppression may be rather general. Connelly 
and Manley (10) studied the simian virus 40 early polyadenyl- 
ation region, which contains two closely spaced AAUAAA 
signals. In the wild-type situation, only the 3' site is efficiently 
utilized. However, if this preferred site was inactivated by 
mutation, increased usage of the 5' site was observed. In 
addition, Denome and Cole (11) showed that addition of 
tandemly arranged polyadenylation signals decreased usage of 
the upstream site. These findings imply that genomes may 
contain numerous potential sites of polyadenylation whose 
activity is suppressed by the relatively close apposition of 
another polyadenylation signal, which perhaps competes more 
efficiently for a limiting polyadenylation factor. Tlierefore, 
alternative polyadenylation, which is a well-documented con- 
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FIG. 4. Evidence that UAUAUA at nt 4073 plays a role in poly(A) site selection. (Left panel) Radiolabelled antisense RNA probe extending from 
BPV nt 4450 to 3912 was transcribed in vitro from EPA2, hybridized to 10 ^Lg of cellular RNA isolated from EPA2p (lane a). ID13 (lane b), or C127 
(lane c) cells, and digested wiih a mixture of RNases A and Tl . Protected fragments were subjected to poly aery lam ide gel electrophoresis and detected 
by autoradiography. The arrowhead on the left indicates the position of approximately 190- to 2(X)-nt prot>e fragments extending from the 3' end of 
the probe to the upstream sites of polyadenylation around BPV nt 4100. The vertical line on the left indicates the position of the larger fragments 
also protected by mutant RNA. The approximately 265-base fragment in lane b appears lo be derived from partially digested hybrids. The lengths (in 
nucleotides) of coelectrophoresed size markers are shown, P indicates the position of undigested probe (538 nt). (Right panel) Schematic 
representation of the antisense probe, sizes of the protected fragments, and deduced structures of viral RNA species. The X's show the positions of 
the mutations at the upstream and downstream polyadenylation signals in the probe and in RNA isolated from cells transformed by the double 
mutant. 



trol point for regulating gene expression (25), may result in 
some cases from inactivation of a preferred polyadenylation 
signal rather than by direct activation of a suboptimal one. 
The mechanism by which BPV prevents expression of viral 
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FIG. 5. Usage of early region polyadenylation sites. The horizontal 
lines represent the region of the BPV genome around the early 
polyadenylation site for wild-type BPV DNA (142-6) and the indicated 
mutants. Transcription proceeds from left to right. The unbroken box 
represents the normal early polyadenylation consensus signal at nt 
4180, and the dashed boxes represent the putative upstream polyade- 
nylation signal at nt 4073. The vertical arrows indicate the pc>sitions of 
poly(A) addition, and triple arrows indicate heterogeneous polyade- 
nylation sites, with minor sites represented as dashed arrows. Boxes 
containing an X indicate a mutant polyadenylation signal. 



late genes in transformed cells but allows their expression in 
differentiated keralinocytes is central to an understanding of 
papillomavirus biology. One level of restriction in transformed 
cells is clearly at the level of stable mRNA accumulation, 
because little or no BPV mRNA from the late region is present 
in cultured fibroblasts. Analysis of nascent RNA from ID 13 
cells indicates that at least 90% of BPV transcripts terminate 
between the early and late poly(A) sites and therefore never 
reach the late poly(A) signal (6). The mechanism(s) allowing 
production of late RNAs during natural infection may act 
primarily at the level of polyadenylation site selection, or it 
may act at some other steps in mRNA biogenesis, such as 
alterations in promoter usage, splicing patterns, or transcrip- 
tion termination, which secondarily affect cleavage and poly- 
adenylation (for examples, see references 1, 7, 26, and 36). 
However, the results presented here indicate that specific 
suppression of the major early polyadenylation signal is un- 
likely to be the sole step in releasing the block to BPV late 
gene expression, because inhibition of polyadenylation at 
additional potential early sites must also occur. The mecha- 
nism involved in late gene expression must coordinately sup- 
press cleavage and polyadenylation at multiple potential sites 
near the 3' end of the early region in some of the transcripts, 
while many transcripts are still polyadenylated at the early 
polyadenylation site. Regulation of BPV late gene expression 
is clearly a complex process and bypass of the early major 
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polyiidcnylalion site is a necessary but not sufTicicnt compo- 
nent of that process. The study of BPV transcriptional regula- 
tion promises to provide insights into not only papillomavirus 
biology but also the mechanisms of regulation of gene expres- 
sion in general. 
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[NTRODUCnON 

Biological systems are the masters of chemical syn* 
thesis. The rcmarkabk spedficity of their catalysts, 
the enzymes, allows hundreds of reactions lo proceed 
simultarjeously inside the liny rcacior that is a living 
cell. Etuymes' ability to carry out complex chemical 
reactions* and to do so under very mild conditions 
with virtually no waste products, has earned them the 
admiration of chemists and biochemists. It is easy lo 
envision that a future chenucaJ industry sensitive 
to both energy needs and the environment could 
be modeled after these highly efficieiit chemical 
factories. 

The molecules responsible for this remarkable pei- 
formanoe arc the enzymes. Enzymes arc proteins, lin- 
ear chains of typically hundreds of amino adds that 
fold up imo unique^ well-defined three-dimensional 
structures. The backbone of the polymer chain folds 
into a structure chat is unique to the particular cata- 
lyst, as illustrated in Fig. 1(a) for the enzyme sub- 
tlUsin. The enzyme's substrate (grayX the compound 
on which the reaction is catalysed^ fits snugly into the 
substrate binding pocket. The enzyme positions speci- 
fic catalytic amino add side chains (red) where they 
can assist the chemical reaction to proceed. In 
Fi^. 1(b) the structure of subtilisJn showing its amino 
acid side chains illustrates the complexity of these 
molecular machines. This complexity allows enzymes 
to perform the tnily impressive functions that support 
life and create new life. The result of considerable fine- 
tuning over eons of evolution, this complexity also 
makes it difficult to manipulate these structures to 
obtain new and interesting properties. 

An enzyme is defined by a luiique sequence of 
amino adds, which in turn is dictated by the organ- 



isin*s DNA code (the gene) and assembled in the cell 
(Fig. 2). This amino acid sequence determines how the 
chain folds and, ultimately, how the enzyme functions. 
By modifying the amino acid sequence, we can alter 
the enzyme*a function—this field is known as protein 
engineering. Despite intense research into funda- 
mental features governing proicm folding and func- 
tion, there arc cnonnous gaps in our undersunding of 
two critical processes: the relationship between se- 
quence and structure and the relationship between 
structure and function. As a result, the rational design 
of new proteins by the classical 'reductionist' ap- 
proach can be a frustrating excrdse indeed. In this 
article I will introduce a new and highly effective 
approach to enzyme design and enpnccring that by. 
passes ibe need to undersund diese processes before 
embarking on a protein engineering project But first 
I will explain why the enzymes provided by nature are 
not su&icni. 

Chemical engineers who try to design real indus- 
trial processes using biological catalysis are consrani- 
ly stymied by a simple fact: biological systems have 
evolved over billions of years to perform very specific 
biological functions and lo do so within the context of 
a living organism. Some of the features required for 
function in a complex chemical network are undesir- 
able when the catalyst is lifted out of context. Con- 
versely, many of the properties we wish an enzyme 
would have clash with the needs of the organism, or 
at least were never required. The chemical engineer 
is hardly impressed by a catalyst whose inability 
to tolerate the most common of industrial conditions 
necessitates complicated hardware and reactors of 
the size of football fields. We need catalysts which 
arc Stable to high temperatures, can function in sol- 
vents other than water, tolerate wider ranges of pH, 
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Fift 2. Protein engineering iavolvci the manipulAtion of protein «ruciures and Ainciions ai ihe level of the 
ammo acid (or DNA) tequesce. Sisnificmt gaps in our understanding of the Klaiionshipi between 
sequence, ttniciure and funciion wverdy Irmii our aWity to ■rationairy design* new funniona 



cauJyse icaciioos on substrates not cncountertd la 
nature, and even catalyse new reactions not found in 
nature. 

Many dues as to how to en^neer better tnzymes 
come from studying how nature has created enzymes 
By studsring the evolution of natural proteins, wc have 
leained in fact thai they arc highly adaptable; constantly 
changing molecules, ar least over evolutionary titnc 
sc&les. They can adapt to new environments and they 
can even laJce on new tasks. Wc know, for example, 
that many enzymes catalysing very different reactions 
have come about by divergent evolution from a com- 
mon ancestral protein of the same general structure, 
acquiring diverse capabilities by processes of random 
mutation, recombination, and natural selection. For 
example, the versatile protein structure known as the 
a/fi barrel diverged somewhere in the distant past to 
create a whole series of enaymes wc know today 
fReardon and Farber. The four enzymes shown 
in Fig. 3(a), for example, catalyse quite different relic- 
tions; their physical propcnics and amino add se- 
quences are also qujic disparate. Ji is useful to note 



that, while the barrel-hke protein fold is highly con- 
served, the amino acid sequences and (Unctions of 
these enzymes are not 

A fascinating r^eent example of enzyme evolution is 
the appearance of phosphotricslcrase, an a/jff barrel 
enzyme that hydrolyses, at diffusion-limited rates, 
pwticidcs and chemical warfare agents that have 
existed only for about 50 years. It has been suggested 
that this enzyme, discovered in a soil bacterium, evol- 
ved during the last 50 years from a related sequence 
identtfied in the common £. coti bacterium and now 
known as the 'phosphptriesterase homology protein' 
(Scanlan and Reid, 1995). The biological function of 
this latter protein is unknown. 

Wc also know that enzymes of a given function (for 
example, all catalysing a panicular step in a metabolic 
pathway) can exhibit widely different properties (stab- 
ility, solubility, tolerance lo pH, etc.), depending on 
where they arc found. For example, the three glyceral- 
dehydft phosphate dehydrogenase (GAPDH) enzymes 
listed in Fig. 3(b) have very similar tbree-dimcnsiOTiuJ 
structures; ihdr sequences arc less similar. Wc know 
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Fig. K The 275 amino aeidi of Aubiilisin E fold iiiio a uniquft ihrcc-iliincnsiontil Hfucture. {a) The backbone 
told » rcprcscnicd horc by n Tibbon* diagram, conjirucicd from X-ray cryonl yirucmrc coofdlnj»«» (Dauicr 
ct uL 1991) using the prosramji MolScnpi and RajterJD. PcpUdo substmu and 
ion> arw xbown In cray. Side chains of cuialyiit amino acid residues arc *ho« n in red. (b) Subtilisin lii l ucuirc 
sl)0\*ing ib« poshion> of ihe amino ucid xUlc cbuiiw (ycllo^vi. 




Fi^. 6 Molecular model of itibiilisin E shcming ihc 1 2 amino acio iubsiiiuiionit thai increase eajym< activity 
in DMF fYou ajxJ Arnold. 19961 Yellpw amino acids were -atxuirntbtcd during screening Tor cnhnrtcej 
specific enzyme aaiviiy (Chen ahd AttwW. 1995). Red amino acids were found during screening for ioi»l 
(exprcsjedl cnzynvft acUvHy (You and Arnold, 1996>. Caldum ions and pcpiidc subsiraic ttt shown in gray 





l-ij:. »t». Vl(ilx:cuhif modd n( (he pN » CMcraKv >luminy pttAiiinn% i>f aniihiulic i^-niuobciwyl osli:r subMriMC 
r>cll.iv\i. ;;al;*lyiit rc^^iJlK.•s ircdi. :inj >i.x tKndici:iI niuialii^n> iiccumiikilet! durins dirccicd cv^l|ul^^f\ 
'•>riiiigci f.MtMirc and Arni»ld. i*;vfii. 
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a) One enzyme can become another 
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ancestral a/p barrel 




b) Enzymes evolve for different environments 



Glyceraldchyde- 
3-phosphaie dehydrogenase 
(GAPDH) 




GAPDHfiom 
Homarus americarms 
T« = 50«C 



GAPDH from BaciUus 
stearothermophilus 
To, = 75°C 



GAPDH from 
Thermotoga maririma 



Fig. 3, Structure is conserved during evolution, while amino acid sequences and spediic funcdons at* ofteti 
ivot. (a) The a/p barrel eazyaej indicated tppcat to have evoNcd tjom a common ancestral a/fi barrel 
protein, (b) Three GAPDH enzyme* isolated from diflmat organisms have very ftimllBr 8truciures» but 
quite different stabilities and amino acid sequences (Buebncr « al, 1974; SkaizjmsW ei 1987; Komdoer- 

ittetai,. 1995V 



that they, loo, Klivcrged from some cominon ancestor 
a Ions time ago to ooctjpy their currem nicbea. The 
Thermotoga maritima bacterium thrives at very high 
temperatures in ocean thermal vents; consequently^ it$ 
enzymes can tolerate mocb higher tempcraturts than 



the anaJogous enzymes from an Organism which 
grows under less extreme conditions, euch as J?. 
stearothtrmcphith. The Thormoxoga protein tmrolds 
at 98*0, while the same enzyme from the American 
lobster unfolds at only 50*C. As with the fi/p band 
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enzymes^ the structural fold of the GAPDHs is highly 
conserved, while the detailed amino add sequences 
and spedflc properties are not 

DUECm> CVOLUnoNi EXFLORING NEW FUTURES 

The explosion of tools that has come out of molecu- 
lar biology during the last 20 years has made it pos- 
sibfc for us to consider ^evolving* the components of 
biological systems— DNA, RNA and protcins^or 
features never required in nature We cao both speed 
up the rate and channel the direction of evolution by 
Mntroliing mutag^neoift the mtc and tyyt» uf 

changes made — and the accompanying 'selection* 
pressures. As a result, processes that would take mil- 
lions of years in nature can in principle be accomp^ 
lishcd during the time scale of a Ph.D. thasis. By 
uncoupling the enzymes from the constraints of 
ftlnction within a living system, we can step into 
and explore a variety of futures, futures that can 
include novel environments (evolution in a sea of 
methanol instead of water?) or even entirely new func- 
tions (enzymes to break down hazardous chemicals7)L 
We can explore questions such as "can one catalytic 
activity become another, and how?* Furthermore, by 
evolving new functions and thereby new solutions to 
molecular design problems, we learn things about 
these amazing molecular machines that might never 
be revealed if we were to study only those that exist in 
nature. 

The possibilities for biotechnology are especially 
exciting. Directed evolution is a very practical ap- 
proach to tailor-making enzymes for a wide range of 
applications. In addition to building enzymes with 
new features and functions, we can explore important 
questions such as 'how might an enzyme cban^ its 
sequence and properties to break down or evade 
a drug? We could conceivably anticipate in laborat- 
ory experiments what might happen to drug resist- 
ances in nature. In directed evolution experiments we 
could also tune enzymes lo function optimally under 
conditions specified by ua, rather than the context of 
the Uving organism in which it evolved. New enzymes 
could be evolved to carry out reactions nefver required 
by living organisms: 

DEVELOPING A WOBKINC STRATEGY FOR DlRECTeD 
ENZYME tVOLUnON 
In a directed evolution experiment, we first generate 
a library of many diflerent posaibfe ^solutions* to 
a problem The next step is to find the con-ecr solu» 
tion(8), enzymes that exhibit the desired property. 
A conceptual challenge comes in planning how to 
create this library of solutions. The number of pos- 
sible enzymes one can tnake is so vast that an explora- 
tion of their functions must be carefully guided in 
order to avoid becoming hopelessly lost. A typical 
enzyme is a linear polymer Of 300 amino acids. With 
20 possible amino acids at each position in the chain, 
ihcro are 20**** posaible diifcreiii linear combinations. 
If even only a small fraction— say, i in 10*^— of all 



these sequences folds into a well-defined three-dimen- 
sional structure, there are still more structured 
proteins than there are atoms in the unlvcrsef (Note 
that even in three billion years, nature has not 
had a chance to explore but a tiny fraction of 
the possibilities. This also means that there are very 
exciting possibilities for future evolution, including 
evolution in the rest tube.) Because a random samp- 
ling of amino add sequences is unlikely to lead to the 
desired protein, we must begin our exploration by 
starting from a point that we hope is close to where we 
want to be — an en>ym« that ^ppravlmAtae what \v# 
want, but is not ideal. Then we evolve it, by ac- 
cumulating small changes^ similar to what happens in 
nature. 

Nature is very good at searching mutant libraries 
for useful solutions. Unfavorable mutations are win- 
nowed out at the same lime a$ t^cneficial mutations 
are amplified, by hnklng the organism's growth rate 
and reproductive success to the performance of its 
components. In this process of sc/ecrion, those organ- 
isms which grow ftstar quickly dominate, allowing an 
efficient search of very large populations (10^ or more 
for bacteria). 

Unfortunately, many of the features that are of 
interest to us cannot be linked to the survival or 
growth of the host organism— the prerequisite to se- 
lection. Enzymes, for example, can tolerate a variety 
of environments that cannot sustain life, so that the 
orgamm) dies long before the enzyme has a chance to 
'show its stuff. For most problems of practical inter- 
est, in fact, mutant enzyme libraries must be screened 
rather than sclcaed, one enzyme at a rime. That is, 
the enzyme variants must be tested individually 
(screened) for the property of interest. This unfortu- 
nate reality effectively Hmits the search for improve- 
ments to mutant libraries conuining perhaps 10*-10* 
variants, several orders of magnitude smaller 
than what one can search when survival depends on 
success. 

The strategy for molecular evolution is then illus- 
trated by calculating how many diflferent sequences 
one can create by starting from a given enzyme and 
making a few amino acid substitutions, as shown in 
Table 1. While there are only 5700 possible single 
mutants of a 300 amino acid enzyme, there arc still 
more ihan 30 billion different sequences that differ 
from the original enzyme at only three positioi^ 
While a rapid screen might be able to cover a large 
fraction of all single mutants, and even some signifi- 
cant fraction of all double mutants, screening would 
be unable to give more than a very sparse sampling of 
the enzymes with multiple amino add substitutions. 
Unless a vusi majority of the mutations led to the 
desired propmy, dealing with a library of multiple 
thutations would ha an cxpoiimcnt based on wishful 
thinking! (As might be expected for a finely tuned 
molecular machine, most mutations are deleterious or 
at least neutral: beneficial mutations are generally 
rare. The frequency with which one can expect to find 
ben&ficial mutations will depend on the extent to 
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which the panicular feature of intcrtsx has ^ready 
been optimized: the pathways up the mountaui neccs* 
sadly decrease in Dumber as the pinnacle is ap- 
proached.) Because luCk is gcfierally not an accepuble 
basis for rhe success of an experiment the seaich is 
effectively limited to proteins wi(b sequences and 
therefoit proper ties very similar to their parents. In 
addition, wc must be able to tunc the rate or mode of 
mutation to produce libraries of primarily single 
amino add substitutions. 

The principles and power of directed evolution arc 
best illustrated with examples. The first example will 
be the evolution of an enzyme to function in a polar 
organic solvent. It is well known that subrilisia which 
normally cuts up peptides and proteins by deaving 
the peptide bonds linking the amino acids together, 
win also catalyse peptide bond formation. Peptide 
bond formation is favored In organic media, as water 
participates in unwanted side reactions as wdl as 
hydrolysis of the product. Subtilisin actually zamins 
folded and reasonably stable in high concentrations of 
polar organic solvents such as dimcthylformamide 
(DMF). Unfonunatfily, the catalytic activiry is very 
low. There is no fundamental reason, however, why 
subfiiisin could not function in DMF— the enzyme's 
unhappincss refleas a balance among a very large 
number of noncovaleni iuleraclions in the sys- 
tem— protein, solvent, substrate* and products— a 
balance that is adversely affected when the protein is 
dissolved in a nonaqueous medium. Because these 
complex mteractions are poorly understood, we could 
not address this problem by a rational design ap- 
proach. Wc therefore took the 'irrational* approach 
and asked whether we could 'evolve* a subtilisin that 
would function well in DMF (Chen and Arnold, 1991, 
1993; Vou and Arnold. 1996). 

The argumcnu set out above led us to the strategy 
for directing the evolution of an enzyme to perform 
a new function (or, in this case of subtilisiB, an old 
function but under new conditions) iSustrared in 
Fig. 4. In comparison to the enzyme performing 
a function for which it is selcaed. peptide hydrolysis 
in aqueous media, the new job is performed poorly 
indeed, SubtJlisin has not been selected for hydrolysis 
in DMF, and there is, not surprisingly, & great deal of 



Tabic 1. The molecular evolution 'number problem* 



Na ofamloo acid 


No. of possible vatiants 


changes 
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5700 


2 


16,190.850 


3 




4 


43.109.036,717.100 


3 


48.489.044.499.400.000 



Note: Suning with an tnzyme of 300 nmino ecidf, ibe 
number of sequences containing M amino ^cid wbuhution* 
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room for improvement. Because it is feasible to search 
only those sublilisin mutants with one or two amino 
acid substitutions, wc will create and screen & library 
of such mutants for progeny slightly better than their 
parent Tht screening meihod for identifying uie/u/ 
muracw»ts should ensure that the expected jmaW e/i- 
hancements brought about mainly by single mutations 
can be measured. Although these progenies will gcncr^ 
ally resemble their parents, after many generations 
new features can develop, such that the dcsoendcnts 
can be quite different from their ancestor. Therefore, 
the generation of n«w, useful eniym^s also relies on 
having an effective strategy for accumulating rrumy such 
sinall improvements. One such strategy involves carry- 
ing out sequcntiai generations of random mutagenesis 
on the gene (DNA sequence coding for the enzyme) Id 
create a mutant library, coupled with screening of the 
resulting proteins. In each generation a single variant 
is chosen as the patent for the next generation, and 
sequential cycles allow the evolution of the desired 
features. 

We implemented this strategy to evolve subiilisin to 
function in DMF. A powerful molecular biology tool, 
the polymeruse chain reacuon (PCR), was used to 
make miJjjons ofcopitss of the gene that codes for the 
natural, or wild-type enzyme. By carrying out this 
(enzymatic) reaction under sub-optimal condidons^ 
we could introduce base substitutions randomly 
throughout the DNA at a controllable rate. At the end 
of this reaction we have miUions ofgetjo copies, most 
slightly different from the wild-type one. These genes 
are placed back into a circular double-strandcd piece 
of DNA (a plasmid) that contains all the Instructions 
the bacterial cells need to translate the DNA into 
proicisi When the bacteria are transformed wjih these 
plasmids, we have millions of individual chemical 
factories, each producing a different variant of the 
original enzyme. 
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Fig 4. A working siratcgy for directed enzym« evolution. 
The screening method should ensuTC that small enhance- 
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Fig 5. Result* of direoted evoluiioQ ofsutnilisin for laivity 
in DMF by sequential generatioai of random muueenesis 
and screening. The accumulation of 12 amino acid substitu- 
tions in sequential gcoerationi nT random muiagene&is and 
screening resulted in an rniyme >500.fokl more aciivc than 
the uFjld-typc cniymc in 60% DMF. 



Next, the bacterial colony or colonics which pro- 
duce a sublilisin that is more active in DMF must be 
found. In this early experiment our screening strategy 
was crude, but effective. Because subiilisin is secreted 
from the bacilli, the variants could be screened vis- 
ually on nutrient plates conxaining a protein (casein), 
in the presence and absence of DMR The active' 
ciuyjne creates a visible 'halo' surrounding the bacter- 
ial colony whose size is proportional to the caialydc 
activity.* Variants with higher hydrolysis aaivlty 
than wild-type on ihe DMP-containiag plates 
could be identified from their bigger halos (Chen and 
Arnold, 1991). 

The results of the directed evolution cffon ate sum- 
roaiizcd in Fig. 5. At first we Identified three amino 
add substitutions chat individually improved the 
wild-iype enzyme's activity several-fold Using £to- 
directed mutagenesis we combined those three with 
a fourth mutation reported to improve activity and 
stability In other subiilisins, to obtain a four amino 
acid variant about 40-fold more active than wUd-lype 
in 60% DMF (Chen and Amold. 1991). Since the 
process of sequencing the genes of all the positive 
variants and then combining the mutations by site- 
directed muta^esis was laborious, we decided to 
carry oin sequential generations of random muugcn-^' 
cfiis and screening, no longer stopping on the way to 
sequence the intermediates. Applying an additional 
six generations of mut«gcncsis and screening « fow 
hundred colonies in each generation, we created an 



•Bcctujo faaJo size also depend} on enzyme exprewion 
level, enayrue diffusion and col&ny size, it 1j uveful for 
a -rough m\ Positives were confirmed by a second ksvcJ of 
scrcenins in liquid culture (Chen and Arnold. j993). 



cnrymc that is more than 50Q-fold more aaivc in 60% 
DMF than the wild-type sublilisin E (You and Ar- 
nold. 1996). This enzyme exhibits substantial activity 
even in 85% DMF. The whole process was surpris- 
ingly rapid: a total of only about j 0.000 colonies were 
screened to obtain a huge improvement in catalytic 
activity. 

The gene for the final evolved enzyme was se- 
quenced to determine the amino aoid substitutions 
that allowed this enzyme to recover its activity in 
DMF, Of 275 amino acids, 12 were altered; their 
positions arc indicated in Fig. 6. Although the DNA 
substitutions arc targeted randomly throughout the 
entire subtilisin gene sequence, the amino acid substi- 
tutions that enhance catalytic activity arc all posi- 
tioned on the surface of the enzyme^ surrounding the 
active site and substrate binding pocket The majority 
are in evolutionarily variable loops that connect ele- 
ments of conserved secondary structures (helices and 
sheets) (Chen and Arnold, 1993; You and Arnold, 
1996). This information could of course be utilized in 
developing more 'rational* design strategies, including 
narrowing the sequences exposed to random 
mutagenesis In directed evolution. 

Finally, it is worth noting that the resulting enzyme 
is indeed a far more effidcni catalyst than wild-type 
subtilisin for the polymerization of amino acids. This 
evolved enzyme can catalyse, for example, the forma- 
tion of poiy-i^methionine starting from a raccmic 
mixture of methionine methyl ester, The evolved en- 
zyme allows the synthesis of significantly longer poly- 
mers and at much higher yields than the native en- 
zyme in 60-70% DMF (Zhao, H. unpublished re- 
suits). 

The advantage of directed evolution over site-di- 
rected mmagenesis is dean the same amount of effort 
could support the construction and screening of at 
most a few dozen variants vrith mutations directed to 
specific locations. Without a dear mechanism, it 
would be difficult Indeed to pinpoint 12 amino add 
substitutions that enhance cataytic activity in DMF. 
Even then, single site-directed mutations would have 
to be accumulated to create a useful enzyme, itself 
a subsuniial mutagenesis effon involving trial and 
error to find optimal combinadons. 

The most attractive feature of the evolutionary 
strategy outlined in Fig. 4 is its simplicity. It is pos- 
sible, however, that this simple "up-hill climb* ap- 
proach is not an optimal approach to the evolution of 
a particular enzyme; There arc obviously a great nmn- 
bt:r of pathways possible for the evolution of a pro- 
tein, and each choice of parent for the next generation 
represents an irreversible step along one particular 
pathway. What would happen if we simply repeated 
the cxpcriroeni? Depending on which pathway was 
chosen or which mutation happened to be found fir^i. 
the enzyme could end up on a local optimum, unable 
to evolve further. This approach may also appear 
slow; improvements are small in each stop and neces- 
sarily become harder to find the closer the enzyme 
gets to an optimum. 
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SEX IN THE TEST TUBE 

An alteintidve (Urected evoLutloa strategy we fauve 
cecently explored incorporates some important ad- 
vantages atuibuted lo sex in the evolutionary process. 
Gene recombination, the cutting and pasting of whole 
genes or pieces of genes, can significantly Increase the 
speed or molecular evolution by rapidly accumulating 
beneficial mutations and providing a mechanism to 
retnovc deleterious ones. To incorporate recombina* 
tion into directed evolution, we randomly recomb'ne 
genes with positive mutations. A searcji for better 
combinations of mutanons completes a generation of 
directed evolution. 

We have tested this new 'sexual' approach by cre- 
ating an enzyme that efficiently catalyses the hydroly- 
sis of the p^nitrobenzyl (pNB) ester of a ^4actam 
antibiotic in the presence of DMF. The pNB protect- 
ing group is Often used during the large-scale synthesis 
of cephalosporin-type antibiatics. Its selective re> 
moval presents problems, however, particulariy for 
recovery and disposal of the zinc catalyst and the 
large amounts of organic solvents used. Therefore, 
a major pharmaoeutical company devoted significant 
effort some years ago to 5t^ding an enzyme that would 
perform this selective hydrolysis reaction (Brannon « 
1976; Zock ct aU 1994). An enzyme with some 
activity towards pNB ester hydrolysis was identified 
by screening a large number of nucroorganisms, but 
the enzyme's low activity, especially in the solvents 
required to solubilize these materials, made ir a poor 
competitor to the dassic^ chemical catalyst. 

We were challenged in 1994 to evolve a pNB 
esterase with much higher activity, particularly h the 
presence of the polar organic solvents required to 
achieve high substrate solubility. We had two reasons 
to believe that this could be done. First, the enzyme's 
natural function and, therefore^ natural substrates are 
unknown, but they aie unlikely to be the antibiotic 
pNB esters. Second* the natural enzyme's activity is 
very sensitive to organic solvents. Because these fea- 
tures were never required in the enzyme's natural 
setting, we could expect considerable improvement 
throu^ directed evolution. 

The wild-iypc esterase is not secreted by the £. coli 
cells in which it is made, nor does it carry out a reac- 
tion that is easily measured. Thus, we had to develop 
screening strategies more sophisticated than those 
used for the subtilisin. The p-nitrobcnzyl ester hy- 
dlrolysis jvacn'on is assayed iabonousJy by high per- 
fonnancB liquid chrom&iography» a method unsuit- 
able Tor scFBening tens of thousands of colonies. We 
therefore devised a rapid screening assay using a sim- 
ilar, but not identical, p-nitrophenyl ester substrate, in 
order lo have an casy-to-rcad colorimeiric signal The 
screening reactions could then be carried out in the 96 
wcUs of a plastic microliter plate, using an automatic 
spectrophotometer to read and analyse che absorb- 
ance in all 96 wells at once. 

Using this rapid assay to screen about a thousand 
colonies per generation, we completed several sequcn< 



tial cycles of random PGR mutagenesis and screening, 
as illustrated m Fig. 7 (Moore and Arnold. 1996). 
After four generations, the enzyme's specific activity in 
15% DMF had improved 15-fold. In the fourth 
generation, we collected not one, but 64 different 
clones, some of which were better ihan the parent, 
and many of which were not. The purpose for this was 
two-fold. First wc wanted to nuke sure that our 
screening strategy was working properly to give us an 
enzyme that would catalyse the desired p-nitrobenzyl 
hydrolysis rcacrion. not only the colorimeiric p-ni- 
trophenyl screening reaction. The activities of each of 
ihe 64 clones in both reactions are compared in Fig. 8. 



3" 



fceombination 



type fai 154 DMF 



extent of mutation » 

Fig. 7. Oinxicd evolucfofl of pNB esterase in 15% DMF 
involved Tour generations or random muugeoesis and 
screening, followed by one round oT recombbarion or the 
five best genQ from generation 4. The best variant obtained 
liter Tour generations Is 15-rold more active than wUd-iypc. 
The best variant from scroestng 400 colonies of ibe recombU 
nation pool is -SO-told more active Uian wild-type 
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Fig. 8. Comparison of acclviites on target (p-niuobenzyJ) 
and screening (p-niirophenyl) $ubsiriie or 64 pNB esterase 
variants Isolated after fourth gcncradon of random mutagen- 
esis and screening, relative lo parent enzyme from tho third 
gencratton. The five most active v^ri&nts (inside oval) were 
pooled Tor random recombination (see Fig. 9). 
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If the screening reaction pcrfectJy mimicked the de- 
sired leactioo, aU the points would lie on the 
45* line. Although somewbai scattered, there Is none 
the less a reasonable correlation: (he rapid screen 
provides an indicacion of evolution of the desired 
activity thai is acceptable for making a rough cut of 
positive c)oncs. 

The second reason for studying this group of vari- 
ants was to test the alternate, sexual approach for 
accumulating effective mutations. We thus collected 
(he five best mutants, those in the dotted oval in 
Fig. S, and Tccombined them using a 'scxuai* PCR 
method recently described by Stcmmer (1994a, b). 
How the genes arc randomly rccombincd is shown 
schematically In Fig. 9(a), The genes are pooled in the 
test tube and fra^ented with an enzyme that cuts the 
DNA at random positions. In Fig. 9{b). the polyac- 
rylamide electrophoresis gel that separates the DNA 
fragments by length shows that the DNA has been 
digested into a smear of different-sized pieces. Wc 
collected the fra^ents 200-300 base pairs in length 
by extracting the DNA from the appropriate piece of 
gel Tte fUlMcngth gene can be reassembled from this 
pool of random fntgmenis. again using the PCR tech- 
nology, to create a new gene library in which the 
mutations were present in their different possible 
combinations. These reassembled^ recombined genes 
were inserted back into the plasmid and expressed in 
the E, call The best of those rccombincd genes were 
identified, as before, by screening the enzymes they 
code for and produce in the microorganisms. 

Screening only ^^400 colonies yielded eight clones 
with activity significantly greater than the best of the 
five parents— this yield of positives is at least 2Mold 
higher than we found by screening the genes with 
point mutations alone (typically 1/1500). Recombina- 
tion can enhance directed evolution by makmg use of 
the information present in a populatian of improved 
enzymes produced by mutagenesis and screening, in- 
formation (bat would otherwise be discarded. Thus 
far» we have improved the enzyme's specific activity 
towards the antibiotic substrate more than 30-foId in 
15% DMF. The totaj expressed activity is at least 
50-fold gitaier than the original system wc started with. 

Sequencing of the genes coding for improved en- 
zymes once again allowed us to identify the anuno 
acid substituijons responsible for the observed im- 
provements in catalytic performance. Six effective 
mutations are illustrated in Fig. IQ, on a model of the 
pNB esterase developed from the X*ray crystal struc* 
ture of a homologous enzyme (Moore and Amold» 
1 996). As for the case of subdlisin, most of the muta> 
tions are at or near the solvcnt-accc&sibla surface. 
Only one of the six is deeply buried. In contract to 
subtilisin, however, none of the effective amino acid 
substitutions lie in segments of the esterase predicted 
to interact directly with the bound substrate. It is 
possible that the homology modeling yielded an in- 
correct structure, and the mutations do iniemct with 
the p-niirobcnzyl substrate. Or, It may be thai the 
amino add substitutions sampled at positions adjac- 



ent to the substrate were all deleterious, and small 
improvements were only obtained by altering amino 
adds further away. In any case, the tnechanism(s) by 
which these amino acid substitutions enhance the 
catalytic acnvity of the evolved pNB esterases are subtle 
and would have been very difficult to predict in advance. 

CONCLUSIONS 

The directed evolution approach clearly allows us 
to engineer enzymes with novel functions and fea- 
tures. In contrasr to 'rational' design approaches, di- 
rected evolution can be applied even when very little 
is known about an enzyme's structure or catalytic 
mechanism. Since the vast majority of proteins remain 
largely unchztractcHzcd. this marks a huge advantage 
for the evolutionary mechods. This approach, because 
it allows us to explore novel solutions to protein 
design problems, also promises to teach us a great 
deal about protein structure and functioiL 

Future research in directed evolution will include 
development of large-scale screening methods^ so that 
efficient searches of large mutant libraries can be per- 
formed. The construction of optimized mutant librariea 
will also decrease the need for screening. In addition to 
streamlining effons to 'tune* enzymes, these improve- 
ments will allow larger kaps — such as the evolution of 
new catalytic acdvities — to take place. Ssgniftcant im- 
provements io the ease and power ofdireaed evolution 
will also come iVom optimizing the search strategies. 
The many sinulaTities to optimization problems in 
other fields make this a fertile ground for collaborative 
efforts among theoreticians and experimentalists from 
a wide range of engineering disciplines. 
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Fig. 9. Recosnbin&tion of mutaijoa^ by gme shuffling, (a) 
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Fi$. 9. (b) Top polyacrylamldft elect rophoresis jcl shows the separation of the difcfiwd gcqc fragmftnts by size. Fragmcnw 
2C^300 base pairs Inng were recovered by Mtra^tin^ the excised gel «smcni. Thesie were rea-isemblcd Inio ihs fuK-lengUi 
gene (bottom gel, Uncs 3- j and 6-8). FirsI lane on left is a 'Judder' of DNA of known roolecuUr weights. 
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The class A ^lactamase PER-1, which displays 26% 
identity with the TEM-type extended-spectrum 
P-lactamases (ESBLs), is characterized by a substrate 
profile similar to that conferred by these latter enzymes. 
The role of residues Alal64, Hisl70, Alal71, Asnl79, 
Arg220, Thr237 and Lys242, found in PER-1, was assessed 
by site-directed mutagenesis. Replacement of Alal64 by 
Arg yielded an enzyme with no detectable ^-lactamase 
activity. Two other mutants, N179D and A164R+N179D, 
were also inactive. Conversely, a mutant with the A171E 
substitution displayed a substrate profile very similar to 
that of the wild-type enzyme. Moreover, the replacement 
of Alal71 by Glu in the A164R enzyme yielded a double 
mutant which was active, suggesting that Glul71 could 
compensate for the deleterious effect of Argl64 in the 
A164R+A171E enzyme. A specific increase in Acat for 
cefotaxime was observed with H170N, whereas R220L and 
T237A displayed a specific decrease in activity towards the 
same drug and a general increase in affinity towards 
cephalosporins. Finally, the K242E mutant displayed a 
kinetic behaviour very similar to that of PER-1. Based on 
three-dimensional models generated by homology model- 
ling and molecular dynamics, these results suggest novel 
structure-activity relationships in PER-1, when compared 
with those previously described for the TEM-type ESBLs. 
Keywords: P-lactamase/expanded-spectrum cephalosporins/ 
homology modelling/PER-1 /serine enzyme 



Introduction 

PER-1 is a class A p-lactamase characterized by a high 
catalytic activity against expanded-spectrum cephalosporins 
[e.g. cefotaxime (CTX) and ceftazidime (CAZ)] and mono- 
bactams [e.g. aztreonam (AZT)]. The enzyme, which was first 
identified in Pseudomonas aeruginosa (Nordmann et aL, 1 993), 
displays a kinetic behaviour very similar to that of various 
extended-spectrum P-lactamases (ESBLs) belonging to the 
TEM and SHV families (Jacoby and Medeiros, 1991). In 
contrast, PER-1 shares a relatively low amino acid identity 
with these latter enzymes, e.g. 26% with TEM-3 (Nordmann 
and Naas, 1994). 

Recently, we have undertaken biochemical studies in order 



to elucidate the molecular basis of PER-1 activity against 
expanded-spectrum cephalosporins (Bouthors et aL, 1998). 
Molecular modelling and site-directed mutagenesis were used 
to investigate in this enzyme the role played by the amino 
acid residues corresponding to those found at positions 104, 
164, 238 and 240 in the TEM-type ESBLs. In brief, two 
residues, Asnl04 and Ala 164, were shown to be important for 
the activity of PER-1. Asnl04, which corresponds to the lysine 
residue found at the same position in various TEM-type 
ESBLs, would be connected to the key catalytic residue 
Glu 166 via a hydrogen bond network, whereas Ala 1 64, which 
corresponds to a highly conserved arginine in the Q-loop of 
class A p-lactamases described so far (Ambler et al,, 1991), 
could play an important structural role. By contrast, modifica- 
tion of the serine residue found at position 238 in PER-1, 
which is an amino acid found specifically in a large number 
of TEM-type ESBLs (Bush and Jacoby, 1997), resulted in no 
significant modification of the activity of PER-1 against 
expanded-spectrum cephalosporins. Similarly, Gly240 in PER- 
1 was shown to have no essential role in the substrate profile 
of the enzyme. Finally, the catalytic residue Glu 166, found in 
all class A P-lactamases, appeared to be essential to the 
p-lactamase activity of PER-1. However, an unexpected 
residual activity against CAZ and AZT was observed for a 
mutant in which Glu 166 was replaced by Ala, suggesting that 
other residues in PER-1 could contribute to the high activity 
of the enzyme against expanded-spectrum cephalosporins. 

In this work, we investigated other amino acid residues 
found either within or at the vicinity of the PER-1 active site: 
Alal64, Hisl70, AIal71 and Asnl79 which are located within 
the putative Q-loop of PER-1, Thr237 which is found at the 
end of the p3 strand and is likely to participate in the formation 
of the oxyanion hole (Herzberg and Moult, 1987; Strynadka 
et aL, 1992), Arg220 which is located in a position similar to 
that of Arg244 found on strand P4 in TEM-1 and which could 
contribute to the stabilization of the oxyanion pocket via 
hydrogen bonding interactions with strand p3 (Moews et a/., 
1990; Jacob-Dubuisson et aL, 1991) and Lys242 found in the 
loop connecting strands p3 and p4. All these residues were 
modified by site-directed mutagenesis and the kinetic properties 
of the resulting mutants were characterized. By using homology 
modelling and molecular dynamics simulations, we have 
attempted to interpret at the structural level the kinetic data 
obtained for some of the Q-loop mutants. 

Materials and methods 

Chemicals 

Antibiotic powders were provided by the following manufac- 
turers: penicillin G, Laboratoires de Therapeutique Modeme 
(Suresnes, France); ampicillin, cephalothin and kanamycin, 
Sigma Chemical (St Louis, MO, USA); cefotaxime, Labora- 
toires Roussel (Paris, France); nitrocefin and ceftazidime, 
Glaxo (Paris, France); and aztreonam, Bristol Myers Squibb 
(Paris-La Defense, France). 
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Table I. Nucleotide sequence of the oligonuelcotidcs used in site-directed 
mutagenesis 

Amino aeid Oligonucleotide sequence'' 
modification 



Alal64 -> Arg 5'-CATCTGCGCTTCATTTCGGACCACAGCGGTCTC-3' 
His 170 -> Asn 5'-CACCTGATCATCGGCGTTCATCTGCGCTTCATT-3' 
AIal7l ->Glu 5'-CTGCACCTGATCATCTTCGTGCATCTGCGCTTC-3' 
Asn 1 79 -> Asp 5'-TTTCATCGAGGTCCAGTCTTGATACTGCACCTG-3' 
Arg220 -> Leu 5'-TAACAAACCTTTTAACAGCTCTGGTCCTGTGGT-3' 
Thr237 ^ Ala 5'-GGCTTTGATACCCGAAGCACCAGTTTTATGTG-3' 
Lys242 Glu 5'-CGCAGTTTTTCCGGCTTCGATACCCGAAGTACC-3' 

"Specific base changes are underlined. 



The restriction enzymes used in this study were obtained 
from Boehringer Mannheim (Meylan, France) and T4 DNA 
Ugase from Promega (Madison, WI, USA). [^^PJdCTP was 
purchased from Isotopchim (Ganagobie, France). 

Escherichia coli strains, plasm ids and growth conditions 
E.coli CJ236 (Kunkel et ai, 1987) and MV1190 (McClary 
et al., 1989) were used as hosts for phages in site-directed 
mutagenesis experiments. Exoli JM109 (Promega) was used 
for DNA cloning experiments and for expression of blap^^.\ 
and the corresponding mutant genes. 

The recombinant plasmid pRAZl, encoding WapER.i, has 
been described by Nordmann et ai (1993). Bacteriophage 
M13mpl9 (Messing, 1983) was used as a vector in site- 
directed mutagenesis experiments. Plasmid pIC19 (kana- 
mycin®) (Pridmore, 1987) was used in cloning experiments. 

E.coli MV1190 and JM109 were grown at 37°C in Lurian- 
Bertani (LB) (Difco, Detroit, MI, USA) and brain-heart infusion 
(BHI) (Difco), respectively. Solid media were obtained by the 
addition of 2% Bacto-Agar (Difco). Kanamycin (25 jig/ml) 
and ampicillin (100 |ig/ml) were added when necessary. 
Competent E.coli cells were prepared and transformed as 
described by Chung et at. (1989). 

Nucleic acid techniques 

Plasmid DNA was purified using either the alkaline lysis for 
mini-preparations (Bimboim and Doly, 1979) or the Qiagen 
plasmid kit for maxi-preparations (Qiagen, Hilden, Germany). 
Isolation of single-stranded DNA and other standard DNA 
manipulations were carried out according to Sambrook et ai 
(1989). Double- and single-stranded DNA sequencing were 
carried out by the dideoxynucleotide chain termination method 
(Sanger et ai, 1977) using the T7 Sequencing kit (Pharmacia 
Biotech, Saint Quentin en Yvelines, France). 

Site-directed mutagenesis 

Site-directed mutagenesis experiments were perfonned as 
described previously (Bouthors et ai, 1998). In brief, the 
W(7pER.i gene was excised from pRAZl (1.3 kb) and introduced 
into M13mpl9 RF. Site-directed mutagenesis was performed 
using the uracil template procedure of Kunkel et ai (1987). 
The sequences of the synthetic phosphorylated oligonucleotides 
(Eurogentec, Liege, Belgium) used to introduce the different 
mutations in the bla^^^.^ gene are listed in Table I. After 
mutagenesis, each mutant gene was cloned into plasmid pK19 
and the recombinant plasmids thus obtained were introduced 
by transformation into E.coli JM109. The mutant genes were 
all sequenced in their entirety and on both strands. 



Expression and purification of the wild-type and mutant 
^lactamases 

The wild-type and mutant enzymes were purified from I 1 
cultures by a two-step procedure based on an anion-exchange 
column followed by a gel filtration, as described previously 
(Bouthors et ai, 1998). The mutant P-lactamases displaying 
significant activity were detected using the chromogenic 
cephalosporin nitrocefin (O'Callaghan et ai, 1972), while the 
almost inactive enzymes A 1 64R, N179D and A164R-1-N179D 
were identified by electrophoresis on 12% SDS-polyacryl- 
amide gels (Laemmli, 1970), with the wild-type PER-1 P- 
lactamase as a molecular mass reference. In order to avoid 
concerns about enzyme stability, kinetic studies were perfomied 
shortly after purification. The purity of the different enzymes 
was assessed by Coomassie Blue staining of SDS-polyacryl- 
amide gels after electrophoresis. Protein concentration was 
determined by measuring the absorbance at 280 nm (Lorber 
and Giege, 1992) with an e value of 34 850 M-'.cm"' (Bouthors 
et ai, 1998). For mutants exhibiting more than one protein 
band on SDS-PAGE analysis, the intensity of the P-lactamase 
band was measured with a computerized densitometer 
(Densylab, Bioprobe) and the enzyme concentration was deter- 
mined with reference to a standard BSA scale analyzed in the 
same conditions. 

Isoelectric focusing 

Isoelectric focusing was perfomied with a LKB Multiphor 
apparatus with pH 3.5-9.5 PAG plates (Phamiacia Biotech). 
Gels were focused at 30 W for 90 min at 10°C. p-Lactamase 
activity was revealed by staining with the nitrocefin assay. 

Determination of the kinetic parameters of the wild-type and 
mutant enzymes 

Kinetic assays were performed spectrophotometrically in 0. 1 M 
sodium phosphate buffer (pH 7.0) at 30°C on a Uvikon 
940 spectrophotometer. The wavelengths and the extinction 
coefiicients used were as follows: penicillin G, 232 nm, 
Ae = -1100 M"'.cm"'; cephalothin, 262 nm, Ae = 
-7960 M-'.cm"'- cefotaxime, 260 nm, Ae = -6710 M-'.cm"'; 
ceftazidime, 260 nm, Ae = -8660 M~'.cm~'; and aztreonam, 
318 nm, Ae = -650 M"'.cm"'. For each antibiotic, initial rates 
were measured at six different substrate concentrations. Kinetic 
parameters were determined by fitting the Michaelis-Menten 
equation to the experimental data using the regression analysis 
program LEONORA written by Comish-Bowden (1995). The 
values for k^-^i and K^^ were estimated using a non-linear least- 
sqares regression method with dynamic weights (Cornish- 
Bowden, 1995). 

Molecular modelling 

The refined theoretical three-dimensional structures of PER-1 
and the mutant enzymes were constructed by homology model- 
ling using the computer program Swiss-Model (Peitsch, 1996), 
as described previously (Bouthors et ai, 1998). The models 
were then subjected to 5000 steps of energy minimization 
using the Powell minimizer of X-PLOR (Brunger, 1988). The 
Q-loop region in the resulting minimized structures was 
subjected to molecular dynamic simulations in vacuum. The 
molecular dynamics were initially perfomied on the 150-190 
region of PER-1 containing the Q-loop (residues 161-179) 
and the two a-helix regions enclosing the loop (residues 150- 
160 and 180- 190, respectively). The results obtained from 
this large segment indicated that the two a-helix regions 
enclosing the loop were very stable (r.m.s.d. = 0.2 A). 



314 



Therefore, the molecular dynamic simulations were sub- 
sequently confined to the region encompassing residues 160- 
180, using the following simulation procedure: the target 
temperature started at 0 K to reach the final temperature, 
300 K, within 18 ps. After 30 ps of stabilization at 300 the 
molecular dynamic phase lasted 100 ps at 300 K, with a time 
step of 0.001 ps and a dielectric constant (e) of 4.0. The 
confomiations trapped at 300 K were visualized by using the 
VMD (Visual Molecular Dynamics) program (Humphrey al., 
1996), The mean of the conformations, which was subjected 
to 500 steps of energy minimization, was used in structure 
comparison. 

Results 

Production and purification of the mutant enzymes 

Site-directed mutagenesis was used to replace the amino acid 
residues located at positions 164, 170, 171, 179 and 237 in 
PER- 1 by those found at the same positions in TEM- 1 having no 
significant activity against expanded-spectrum cephalosporins. 
Thus, the single mutants A 1 64R, H 1 70N, A 1 7 1 E, N 1 79D and 
T237A, and also two double mutants, A164R+A171E and 
A164R+N179D, were constructed. In addition, Lys242 in 
PER-1, which could be the counterpart of the lysine residue 
found at position 240 in various TEM-ESBLs (Bush and 
Jacoby, 1997), was replaced by a glutamic acid residue, as 
found in TEM-1 at position 240 (Sutcliffe, 1978). Finally, 
Arg220 in PER-1, which is equivalent to Arg244 located on 
the P4 strand in TEM-1, was replaced by a leucine. SDS- 
PAGE analysis of crude extracts showed that all the mutant 
p-lactamases except three, were expressed in normal amounts 
(data not shown). Indeed, when compared with the wild-type 
enzyme, the A164R, N179D and A164R4-N179D mutants 
were expressed at very low levels and the various purification 
attempts carried out in order to determine the kinetic features 
of the three mutants remained unsuccessful. Production and 
purification of the other enzymes, which were all active, were 
performed as described previously (Bouthors et aL, 1998). 

Isoelectric focusing, carried out on the purified enzymes, 
indicated that three mutants, A164R+A171E, H170N and 
T237A, displayed p/ values indistinguishable from that of the 
wild-type protein (pi = 5.4). Conversely, the pi values found 
for the R220L, A171E and IC242E mutants were shifted 
towards more acidic values {pi = 5.2, 5.0 and 4,9, respectively) 
(data not shown). Finally, the isoelectric points of the three 
mutants A164R, N179D and A164R+N179D could not be 
determined since the corresponding crude extracts contained 
no significant P-lactamase activity. 

Kinetic analysis 

The steady-state kinetic parameters /rem fo'" penicillin 

G, cephalothin, cefotaxime (CTX), ceftazidime (CAZ) and 
aztreonam (AZT) were determined from the purified active 
p-lactamases. The values obtained are shown in Table II. 

Wild-type ^-lactamase. As expected, the values of the rate 
constants obtained for the wild-type PER-1 P-lactamase were 
similar to those reported previously (Bouthors et al., 1998). 
The enzyme was characterized by a high apparent affinity for 
penicillin G, cephalothin and AZT (A'j^ values ranging from 
23 to 147 [iM), but a poor apparent affinity for the expanded- 
spectrum cephalosporins CTX and CAZ (441 and 4150 jiM, 
respectively). Conversely, the /rem values for the last two drugs 
(41 and 109 s"', respectively) were markedly higher than those 
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found for the other p-lactam antibiotics (k^ai values ranging 
between 8 and 1 1 s"') (Table II). 

Mutants of residues located in the Q-loop, Four positions were 
investigated at the level of the Q-loop region of the protein: 
164, 170, 171 and 179. Four single mutants (A164R, H170N, 
A 1 7 1 E and N 1 79D) and two double mutants (A 1 64R + A 1 7 1 E 
and A164R + N179D) were analyzed, as described below. 

Mutants AI64R, A 17 IE and Al64R-\-Al7IE. As observed 
previously (Bouthors et aL, 1998), no significant enzymatic 
activity was detected with the A164R mutant. By contrast, the 
substitution of the alanine residue found at position 171 in 
PER-1 by a glutamate resulted in no significant modifications 
of the /Teat and values, when compared with the wild-type 
enzyme (Table II). Similarly, the double mutant A164R+ 
A171E yielded an active enzyme which showed /Teat and 
values similar to those of PER-1, but the k^JK^ ratios for 
CTX, CAZ and AZT were increased by at least an order of 
magnitude (Table II). 

Mutants N179D and A164R^'N}79D. Position 179 is well 
conserved in class A P-lactamases, where an aspartate residue 
is generally found (Table III). The mutation Asnl79 — > Asp 
in PER-1, either in the N179D mutant or in the double mutant 
A164R + N179D, resulted in a complete loss of activity and 
the corresponding enzymes could not be purified. 

Mutant H170N. A histidine residue is found at position 170 
in PER-1, instead of the highly conserved Asnl70 found in 
most of the class A P-lactamases described so far (Ambler 
et al., 1991) (Table III). For penicillin G, cephalothin, CAZ 
and AZT, the H170N mutant displayed k^a^ and K^^ values 
similar to those of PER-1 (Table II). By contrast, a marked 
increase in ^^at was observed for CTX (5.5- fold) with a 
concomittant decrease in the apparent affinity ('-3-fold), thus 
resulting in a 2- fold increase in k^JK^. 

Mutants of residues located in the a/p domain. Three 
positions were studied in the ot/p domain: position 220, position 
237 on strand P3 and position 242 on the loop connecting P3 
and p4 (Bouthors et aL, 1998). 

Mutant R220L An arginine is found at position 220 in 
PER-1 (Table III). This residue might be the equivalent of 
Arg244 in the TEM enzymes, as previously suggested (Matagne 
and Frere, 1995). Replacement of Arg220 by a leucine yielded 
a mutant (R220L) displaying no significant modifications of 
the kinetic parameters for penicillin G and cephalothin. For 
the other drugs (CTX, CAZ and AZT), a general increase in 
apparent affinity was observed. In addition, a significant 
decrease in ^cai CTX was noticed (3.2-fold) (Table II). 

Mutant T237A. As in the ESBLs TEM-5 and TEM-24 
(Sougakoff e/ aL, 1989; Chanal et aL, 1992), position 237, 
which contibutes to the oxyanion pocket and corresponds to 
an alanine in TEM-1, is occupied by a threonine in PER-1 
(Table III). The replacement of Thr237 by Ala yielded an 
enzyme which exhibited a higher apparent affinity for most of 
the substrates tested, particularly for CTX and AZT (A'm values 
lowered by 40- and 10-fold, respectively). By contrast, specific 
and divergent variations of k^ai were observed for CAZ (6-fold 
increase) and CTX (4-fold decrease), but, overall, the k^JK^^ 
ratio for all the substrates tested was markedly increased. 

Mutant K242E. Lysine 242, which would be located in 
PER-1 on a large loop connecting strands P3 and P4 (Bouthors 
et aL, 1998), could be the counterpart of Lys240 found in 
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Table 11. Kinetic parameters" for hydrolysis of (3-Iactam antibiotics by PER-1 and the corresponding mutants 



Enzyme Penicillin G Cephalothin Cefotaxime Ceftazidime Aztreonam 





^ni 


^cat 


^cat^ ^111 








^111 


^cat 








h /if 

'^cat' '*m 






^cal^^m 


PER-I 


27 


8 


296 


23 


8 


348 


441 


41 


93 


4150 


109 


26 


147 


11 


75 




± 3 


± 0.2 


± 22 


± 1 


± 0.1 


± 17 


± 42 


i 2 


± 8 


± 611 


± 15 


± 8 


± 15 


± 0.7 


± 15 


AI64R 


ND'' 


ND 


ND 


ND 


ND 


ND 


ND 


ND 


ND 


ND 


ND 


ND 


ND 


ND 


ND 


A17IE 


49 


9 


184 


47 


14 


298 


284 


39 


137 


4309 


134 


31 


42 


5 


119 




± 2 


± 0.2 


± 13 


± 2 


± 0.2 


± 20 


± 6 


± 0.4 


± 4 


± 95 


± 3 


± 1 


± 1 


± 0. 1 


± 5 


NI79D 


ND 


ND 


ND 


ND 


ND 


ND 


ND 


ND 


ND 


ND 


ND 


ND 


ND 


ND 


ND 


AI64R+A17IE 


23 


9 


391 


27 


12 


444 


300 


61 


203 


2087 


123 


59 


45 


15 


333 




± 0.01 


± 0.01 


± 0.2 


± 1 


± 0.2 


± 25 


± 19 


± 2 


± 20 


± 207 


± 10 


± II 


± 4 


± 0.5 


±39 


AI64R+N179D 


ND 


ND 


ND 


ND 


ND 


ND 


ND 


ND 


ND 


ND 


ND 


ND 


ND 


ND 


ND 


HI70N 


15 


4 


266 


10 


4 


400 


1286 


225 


175 


4051 


150 


37 


78 


14 


179 




± 0.7 


± 0.04 


± 15 


± 0.4 


± 0.05 


± 9 


± 85 


± II 


± 8 


± 616 


±19 


± 10 


± 9 


± 1 


±19 


R220L 


15 


10 


666 


10 


4 


400 


167 


13 


78 


1329 


83 


62 


39 


6 


154 




± 0.4 


± 0.06 


± 47 


± 0.3 


± 0.03 


± 14 


± 8 


± 0.3 


± 5 


± 33 


± 1 


±3 


± 3 


± 0.1 


±14 


T237A 


22 


11 


500 


7 


15 


2143 


10 


10 


1000 


2181 


678 


311 


14 


18 


1286 




± 0.4 


± 0.05 


± II 


± 0.3 


± 0.1 


± 117 


± 0.7 


± 0.2 


± 94 


± 232 


± 56 


± 59 


± 1 


± 0.2 


± no 


K242E 


34 


8 


235 


36 


12 


333 


873 


94 


108 


3672 


147 


40 


104 


16 


154 




± 0.5 


± 0.03 


± 4 


± 0.6 


± 0.06 


± 7 


± 0.7 


± 0.05 


± 0.1 


± 31 


± 1 


± 0.6 


± 5 


± 0.4 


± II 



"Units for Kj^, k^at and k^JK^^ arc \iM, s"' and s"'.mM"', respectively. Values of standard errors arc indicated below the kinetic values. 
*TvJD, not detectable (Jtca, < 0.05 s"'). 



Table III. Multiple sequence alignment of p-lactamascs PER-I, TEM-1, 
TEM-5, SHV-8, Streptomyces albus G and Staphylococcus aureus PCI 



p-Lactamasc Amino acid at position" Reference 





164 


170 


17! 


179 


220 237 240 




PER-1 


A 


H 


A 


lyb 


R 


T 




Nordmann and Naas 


















(1994) 


TEM-I 


R 


N 


E 


D 


L 


A 


E 


Sutcliffc(1978) 


TEM-5 


S 


N 


E 


D 


L 


T 


K 


Sougakoffe/ al. (1989) 


SHV-8 


R 


N 


E 


N 


L 


A 


E 


Rashccd et al, (1997) 


S. albus G 


R 


N 


S 


D 


R 


Q 


R 


Dchottay et al. {mi) 


S.aureus PCI 


R 


N 


Y 


D 


L 


A 


I 


East and Dyke (1989) 



"Numbering according to Ambler et al. (1991). 

''Boldface letters indicate the amino acids shared by PER-I and other 

enzymes. 

'^Residue number 242 in PER-!. 



various ESBLs displaying a high activity against CA2 and 
AZT (Bush and Jacoby, 1997) (Table III). This residue was 
replaced by a glutamic acid, which is the residue found at 
position 240 in TEM-1 (Table HI). As shown in Table II, the 
steady-state kinetic parameters determined from the IC242E 
mutant were nearly identical with those measured from PER-L 

Discussion 

We have described the catalytic behaviour of various PER-1 
mutants in which residues 164, 170, 171, 179, 220, 237 and 
242 were modified. 

In PER-1, an alanine residue is found at position 164 instead 
of the highly conserved arginine identified in the other class 
A P-lactamases (Ambler et aL, 1991). As reported above and 
as observed previously (Bouthors et al., 1998), replacement 
of Ala 164 by Arg in PER-1 resulted in a mutant protein which 
could not be detected on SDS-PAGE analysis and which 
displayed no detectable (i-lactamase activity. In order to explain 
such a result, theoretical three-dimensional models of the class 
A P-lactamase PER-1 and the corresponding mutant AI64R 



were constructed and compared with each other (Figure lA). 
Despite the relatively low degree of identity found at the 
amino acid level between PER-1 and the other class A 
P-lactamases, homology modelling was used to generate the 
model structures of PER-1 and the A164R enzyme because it 
is now well established that class A P-lactamases form a super 
family of enzymes that are all characterized by a very similar 
structural organization, particularly at the level of the active 
site (Joris et al., 1991). Molecular dynamic simulations were 
then performed from the models in order to assess the extent 
of the conformational modifications that could occur in the Q- 
loop region of the mutant by comparison with that of the wild- 
type enzyme. Based on the results obtained, the Q-loop region 
in PER-1 appears to be characterized by fairly high flexibility 
(data not shown). Such a result could be related to the fact 
that the PER-1 Q-loop is not stabilized by several ionic- 
bonding interactions, thus contrasting with the four salt bridges 
found in TEM-1 between the Q-loop residues Argl61 and 
Asp 1 63, Arg 1 64 and Glu 1 7 1 , Arg 1 64 and Asp 1 79, and Asp 1 76 
and Argl78 (Jelsch et al., 1993). Therefore, it is likely that 
the Ala 164 — > Arg substitution induces in PER-1 significant 
conformational modifications at the level of the Q-loop. 
Accordingly, the topology of the main-chain atoms between 
residues 171-179 is significantly different in the A164R 
mutant, when compared with PER-1 (r.m.s.d. = 0.6 A) 
(Figure lA). In the the wild-type enzyme model, the Q-loop 
conformation is generally wider than in the mutant staicture, 
the side chain of Asp 172 being oriented outwards the loop. 
By contrast, in the AI64R enzyme, the bulky side chain of 
Arg 164 would point inwards the Q-loop and, due to a putative 
salt bridge bonding interaction, the Asp 1 72 side chain would 
be reoriented towards that of Argl64 (Figure lA). Such a salt 
bridge cannot be established without a significant conforma- 
tional modification of the 172 region (Figure lA), which 
accounts for the instability and the loss of activity of the 
mutant enzyme. 

Contrasting with the behaviour of the A164R mutant, the 
substitution in PER-1 of Alal71 by a glutamic acid yielded 
an enzyme characterized by kinetic parameters very similar to 



316 



Site-di reeled mutagenesis of PKR-I p-lacliiini»sc 




ABC 

Fig. U Ribbon models of PER-1 (blue) and the two mutants AI64R (red) and A164R+AI7IE (yellow). (A) Superposition of PER-1 and AI64R; 
(B) superposition of PER-1 and AI64R+A17IE; (C) model of strands p3 and p4 in wild-type PER- 1. In (A) and (B), the models represent the mean of the 
conformations obtained by dynamic simulations of the 160-180 region encompassing the £^-loop (residues 1 61-179). Model (C) was obtained by homology 
modelling and energy minimization, as dccribed in the Materials and methods section. Thin colour sticks: side chains of residues 70, 164, 166, 170, 171, 172, 
220 and 237. Green dotted lines: hydrogen bonds. The figures were generated by using the program Insightll from Molecular Simulations. 



those obtained from the wild-type enzyme. Moreover, it is 
noteworthy that introduction of the Ala! 71 — > Glu mutation 
in the inactive mutant A164R yielded a double mutant, 
A164R+A171E, which was fully active and exhibited kcat and 
values nearly identical with those of PER-1 and A171E 
(Table II). Strikingly, in the minimized mean of the 
A164R+A17IE structure model obtained by molecular 
dynamic simulations, the side chain of GIul71 could be 
adequately oriented to be hydrogen bonded and/or to form a 
strong ionic bond with the side chain of Argl64, while that 
of Asp 172 would be consequently oriented outwards the 
Q-ioop, i.e. in a position nearly identical with that found in 
the wild-type PER-1 enzyme (Figure IB), explaining why the 
double mutant remained fully active despite the presence 
of Argl64. 

Two other amino acids (Asnl79 and His 170) were investi- 
gated in the Q-loop of PER- 1 . The asparagine residue, found 
at position 179 in PER-1 as in the ESBL SHV-8 (Rasheed 
et al., 1997) (Table III), was initially thought to play a specific 
role in the activity of PER-1 against expanded-spectrum 
cephalosporins. Unexpectedly, replacement of Asnl79 by an 
aspartate, which is a residue conserved in a large number of 
class A p-Iactamases, was highly deleterious for the overall 
P-lactamase activity of the two PER mutants NI79D and 
A164R+N179D (Table II). It must be pointed out that the 
interaction between residues 1 64 and 1 79, which links the two 
ends of the ^2-loop region in class A P-lactamases, is important 
for a suitable positioning of the key catalytic residue Glu 166 
(ICnox, 1995; Matagne et ai, 1998). Therefore, it is tempting 
to speculate that the presence of an aspartate residue at position 
179 in the inactive mutants N179D and A164R+N179D could 
alter significantly the position of Glu 166 and, thereby, the 
p-lactamase activity. 

The histidine found at position 170 in PER-1 corresponds 
to a highly conserved asparagine residue in the other class A 



P-lactamases (Ambler et al., 1991) (Table 111). Unexpectedly, 
the kinetic parameters exhibited by the H170N mutant were 
similar to those obtained froin PER-l, except for a 5.5-fold 
increase in the k^ai value for CTX with a concomitant decrease 
in the apparent affinity for this antibiotic. Palzkill et ai. (1994) 
have reported that the replacement of the highly conserved 
Asnl70 by a histidine in TEM-1 yielded an active enzyme 
showing unmodified catalytic constants. Taken altogether, these 
data suggest that His 170 is not a key residue for the substrate 
profile of PER-1 and one can hypothesize that this residue 
was present in the ancestor of the PER-1 p-lactamase and has 
been conserved during the evolution process leading to PER-1. 

Three positions in PER-1 were investigated in the region of 
the ot/p domain forming one of the two edges of the active 
site. Residue 237, located on the P3 strand, belongs to the 
so-called oxyanion pocket and is involved in the binding of 
p-lactams (Ghuysen, 1994; Matagne et aL, 1998). In PER-1, 
a threonine is found at position 237 (Figure IC), which is 
located between the KJG triad and Ser238. Strikingly, it 
has been previously reported that various TEM-type ESBLs 
harbour a A237T substitution (Bush and Jacoby, 1997). More- 
over, another hydroxylated residue (a serine) is found naturally 
at position 237 in the class A p-lactamase from Proteus 
vulgaris which displays a high catalytic activity against CTX 
and it has been shown that the substitution Ser237 ^ Ala in 
this enzyme leads to a decrease in the catalytic efficiency 
against this drug (Tamaki et aL, 1994). Therefore, the decrease 
in )tc,„ observed for CTX with the T237A mutant of PER-1 
confirms that Thr237 is important for the catalytic activity of 
PER- 1 towards this drug. However, the general increase in 
KJ^m observed for the T237A mutant of PER- 1 against CTX, 
CAZ and AZT, which is due to a general increase in apparent 
affinity towards cephalosporins, was rather unexpected 
(Table II). Nonetheless, these results were confimied by 
modifying the arginine found at position 220 in PER- 1 . Indeed, 
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according to the hypothetical model of PER- 1 shown in Figure 
IC, the side chain of Arg220 would point towards the active 
site cavity and could be hydrogen-bonded to that of Thr237. 
As a consequence of this structural organization, it is likely 
that both residues contribute to adjusting the topology of the 
oxyanion pocket, as previously suggested for other class A 
P-lactamases (Matagne and Frere, 1995). In accordance with 
such a model, the replacement of Arg220 by Leu in PER- 1, 
which leads to the loss of the hydrogen-bonding interactions 
between residues 220 and 237, yielded a mutant enzyme 
(R220L) showing kinetic properties similar to those exhibited 
by the T237A mutant, i.e. a significant decrease in the catalytic 
activity against CTX associated with a better apparent affinity 
for expanded-spectrum cephalosporins and AZT (see Table II). 

Finally, we also studied the lysine residue found at position 
242 at the end of the p3 strand in PER- 1 (Figure IC), which 
might be the counterpart of Lys240 found in various TEM- 
type ESBLs (Bush and Jacoby, 1997). The replacement of 
Lys242 in PER-1 by a glutamic acid residue, which is the 
residue found at position 240 in TEM-1 (Table III), yielded a 
mutant enzyme with kinetic properties very similar to those 
of PER- 1. This result indicates that Lys242 does not play in 
PER- 1 a role equivalent to that of the lysine found at position 
240 in the TEM-type ESBLs. 

In conclusion, PER-1 is a class A ESBL which illustrates 
well the fact that enzymes showing a high level of divergence 
in their amino acid sequences can share very similar substrate 
profiles. Furthermore, our results indicate that, in contrast to 
the TEM-type ESBLs, the PER-1 activity towards expanded- 
spectrum cephalosporins does not stem from the presence in 
the active site of a limited number of residues having a specific 
role in the hydrolysis of these drugs. The X-ray structure 
determination of PER-1, which is in progress, will aid further 
understanding of the structure-activity relationships of this 
peculiar class A P-lactamase. 
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INTRODUCTION 

Since the discovery of DNA elements controlling the initiation 
of transcription by RNA polymerase II some ten years ago, it 
has become evident that die frequency of transcription initiation 
depends on proteins interacting with specific DNA elements of 
gene regulatory regions. Especially during the last three years 
an enormous number of such proteins, called transcription factors, 
have been isolated and characterized. 

In the following table, we present a listing of transcription 
factors. It may serve as a dictionary of transcription factors, and 
should help to identify putative regulatory DNA elements of not 
yet analysed promoter regions. To keep the listing manageable, 
it was limited to vertebrate-encoded factors regulating the 
expression of genes transcribed into mRNA, i.e. by RNA 
polymerase II. An alphabetical order of the listing was chosen, 
since otherwise most of the well characterized factors had to be 
placed into more than one of the listed categories, such as protein 
families (e.g. zinc finger proteins) or factor families (e.g. steroid 
hormone receptor superfamily). Names of synonyms or of 
homologues derived from other species are listed in the second 
column. In the third column, the specific regulatory DNA element 
(if possible, the consensus sequence) that is recognized by the 
respective factor, is given (please note, that also the 
complementary sequence represents a specific binding site, since 
most transcription factors act in an orientation-independent 
manner). 

In the following columns, structural features, tissue specificity 
and some general informations of each factor are noted. 
Unfortunately, the broad range of this listing left litUe space to 
describe all important features of well characterized factors such 
as API, NFxB or Spl . Also, we are sure to have missed some 
important elements. Hence, the following table, and especially 
the 'features' column, represents a rather personal and selective 
view, which might restrict its useftilness for some purposes. 
Hence, in order to avoid misinterpretations, readers are 
encouraged to refer to the original publications cited in the table, 
and to references therein. 
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Legend to the table: 

1) The species in which a factor has been identified is given in 
superscript letters (b: bovine; c: calf; ch: chicken; f: frog; h: 
human; ha: hamster, m: murine; r: rat; s: simian). 

2) The names of factors listed here are synonyms of the names 
listed in the first column, or homologous factors found in other 
species, or both. The species in which the factors have been 
identified are given in superscript letters (cf. 1)) 

3) If possible, consensus sequences derived from several binding 
sites are given. Since most transcription factors act in an 
orientation-independent manner, also the complementary 
sequence represents a specific binding site (R: purine; P: 
pyrimidine; N: any nucleotide). 

4) The molecular weight of the factors was determined by 
different methods as indicated by superscript letters (a: 
SDS-PAGE; b: gel filtration; c: estimated from the corrsponding 
cDNA). If a factor exists in two distinct forms, the of the 
two forms is shown with a slash (e.g. 45/50); if a factor exists 
in several forms, or if the size was not exactly determined, the 
range of the Mr is shown (e.g. 45 - 50); if a factor consists 
of two or more polypeptides, the of the polypeptides is 
shown (e.g. 45 + 50). 

5) bHLH: helix-loop-helix protein containing a basic domain; 
bZIP: leucine zipper protein containing a basic domain; FHD: 
fork head domain; HD: homeodomain; HSH: helix-span-helix 
protein; POU: POU specific domain; zinc f.: zinc finger. 

6) Posttranslational modifications are given in brackets (Ph: 
phosphoprotein; 0-gly: O-glycosylated). 

7) Compounds inhibiting the action of the respective factor are 
given in brackets. 
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The nucleotide sequence of a 2.3-kb Sphl fragment containing the structural gene (phsA) for phenoxazinone 
synthase (PHS) of Streptomyces antibioticus was determined. The sequence was found to contain an open 
reading frame (ORF) with a G + C content of 71.5% oriented in the direction of transcription that was 
confirmed by primer extension. The ORF encodes a protein with an of 70,223 consisting of 642 amino acids 
and is preceded by a potential ribosome-binding site. The codon usage pattern is in agreement with the general 
pattern for streptomycete genes, with a 92.5 mol% G+C content in the third position. The N-terniinal sequence 
of the mature PHS subunit corresponds exactly to that predicted from the nucleotide sequence. Neither ATG 
nor GTG initiator codons were identified for the protein. However, a TIG codon was located near the amino 
terminus of the mature protein and is a good candidate for the initiator codon. The transcriptional start point 
of phsA was located 36 bp upstream of the start codon by primer extension. The -10 region of the putative 
promoter showed some similarity to the consensus sequence for the major class of prokaryotic promoters, but 
the —35 region was less similar. Comparison of the primary amino acid sequence of PHS of S. antibioticus with 
other amino acid sequences indicated that PHS is a blue copper protein with copper binding domains in the 
N-terminal and C-terminal regions of the polypeptide chain. A 5.yrBl fragment containing the promoter region 
of phsA and a portion of the ORF was shown to promote jry/£ expression when cloned in the streptomycete 
promoter probe vector pU2843. This phsA promoter-dependent jry/£ expression could be repressed by glucose 
in S, antibioticus when the organism was grown on glucose or galactose plus glucose. Thus, the cloned prom(»ter 
region appears to contain the sequences responsible for catabolite repression of PHS production. 



Actinomycin is one of the antibiotics produced by the gram- 
positive actinomycete Streptomyces antibioticus (52). A putative 
pathway for actinomycin biosynthesis was proposed several 
years ago (50), and biochemical, physiological, and genetic 
studies have confirmed the essential details of that pathway. 
Five enzymes from S. antibioticus, Streptomyces chrysomallus, 
and Streptomyces patvulus have been isolated and character- 
ized to demonstrate their involvement in the actinomycin bio- 
synthetic pathway (6, 11, 22, 23, 28-31). One of these enzymes, 
phenoxazinone synthase (PHS), catalyzes the oxidative con- 
densation of two molecules of 4-methyl 3-hydroxyanlhraniloyl 
pentapeptide to form actinomycinic acid, which is the penul- 
timate intermediate in the putative biosynthetic pathway (Fig. 
1). The enzyme was first identified by Katz and Weissbach (28) 
and subsequently purified by Choy and Jones (6). To date, 
phsA, the gene coding for PHS from S. antibioticus, is the only 
gene involved in actinomycin biosynthesis that has been cloned 
(25). 

Although essentially all of the enzymes required for actino- 
mycin production have been identified, little is known about 
the regulation of these enzymes and of overall actinomycin 
production. Of all the enzymes identified, PHS is perhaps the 
best characterized. It was shown some years ago by Marshall 
and coworkers that actinomycin production is repressed in S, 
antibioticus cultures grown on glucose or galactose plus glucose 
as compared with cultures grown on production medium with 
galactose alone as the carbon source (39). 

Catabolite control has been implicated in the expression of 
both PHS and actinomycin synthetase I (ACMSI; the enzyme 
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which activates the precursors of the actinomycin chro- 
mophore in S. antibioticus [23, 30, 31]). PHS production was 
demonstrated to be subject to catabolite control shortly after 
the identifiealion of the enzyme (13, 28). It is possible lhal 
phsA and acmsl are located in the same genomic region in S. 
antibioticus since it was well known that the genes for antibiotic 
production are clustered in the streptomycete genome (for 
examples, see reference 38). Therefore, the detailed molecular 
analyses of the mechanisms controlling the expression oi phsA 
are es.sential to our understanding of the regulation of actino- 
mycin biosynthesis and the synthesis of other antibiotics. We 
report here the nucleotide sequence and transcriptional anal- 
ysis oi phsA and identify the promoter region of the gene. We 
also demonstrate that a cloned fragment containing the puta- 
tive promoter is active in a streptomycete promoter probe 
vector and that the activity of the promoter is repressed when 
S. antibioticus transfer man ts containing the relevant constructs 
are grown on glucose or galactose plus glucose as compared 
with cultures grown on galactose as the sole carbon source. 

MATERIALS AND METHODS 

Or{*ani!iins imd growth i-onc)itiun.s. The Sireptomyces strains used were S. 
antibinucHS IMKU 3721) and Sirepiomyces livulam 66 derivative TK24 (IK). S. 
antibioucus was grown on liquid NZ-amine and galactose-glutamic acid media as 
described pruviou.sly (1.1). .V. lividans was generally gntwn on yeast extract-mall 
extract plus M% sucrose (YEME) or on iryplonc .soy brolh. For protoplast 
preparation, TK24 was grown on YEME with MgCU and glycine at the linul 
concentrations ol' 5 mM and 0.5%. rcspeclively (49). Protoplasts were allowed to 
regenerate on R2YE medium (49) lor 12 io 24 h and then itverlaid with 2 to 3 
ml or soft nulriciil agar .supplemented with thiostrepton at a Dual concentration 
ol 501) jig/ml. Tymsiiie at i)M75% (wt/vol) was added to tlic soil nutrient agar lor 
overlaying when plJ7()2 derivatives were used. 

l^schcrichia coli DI15a [F" i^HO dlacZMIS (lacZYA-at}il-^ U169 etutA! rccAl 
hsdRl? (fK"" mK"*") (leoH ilii-I supE44l i(yrA96 relAi\ and XL-1 Blue 2 \rccAI 
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FIG. 1. The IMIS reaction. The penuUimale step in Ihe actinomycin hiosyn- 
Ihetic pathway in S. uiuibinucm. the dxidalive condensation of two molecules ol 
4-mcthyl 3-hydroxyanthranitoyl pent ape ptidc to yield aelinomycinic acid, is cat- 
alyzed by Plis. 



mdAI i(yrA96 thi-J hsdUn siipE44 relAl lac (F' pmAH lacnZ MJ5 TnIO (Tet')l 
were generally cultured in L broth or on L agar (35). /:. a>//-compe(enl cells were 
prepared hy the CaCU method and Iransiormed as described by Sainbrook el a I. 
(46). Alter iranslonnalion with pUCMJ and pBlucseripl SK"^ derivatives, trans- 
i'ormaiits were selected on L agar plates containing 10(1 ^.g ol' anipicillin per ml, 
40 mg of 5-hromo-4-chloro-3-indolyl-3-D-galactopyraiioside (X-Cial) per ml, and 
t).2 mM isopropyl-p-iMhiogalaetopyranoside (IPTG). For single-stranded DNA 
preparation, strains containing pHluescripl SK"*" derivatives were grown in 2XYT 
medium In the presence of 100 fig of ampicillin per ml and helper phage 
VCSMI3 lor 2 h followed by the addition of 75 ^.g of kanamycin per ml and 
growth overnight. Growth temperatures for Strcptomyces spp. and atli were 30 
and 37'C, respectively. 

DNA manipulations. PI as mid and chromosomal DNAs were prepared as de- 
scribed previously (4, 17, 20) and analyzed hy restriction digestion and agarose 
gel electrophoresis. In .some experiments, restriction fragments were recovered 
from low-niclling-point agart)se as described by Favre (10). Protoplast prepara- 
tion, Iransformation, and regeneration were as described previously (17. 20, 25). 
A Il.st of plasmids u.sed or generated in the present .study is provided iti Table 1. 
pJSEy23 is a derivative of plJ2501 (25) with mXb(\\ linker inserted at the /Vid 
site nfphsA. pJSE929 contains the blunl-cnded W.vrBI sublragment of the/7/hv/t 
promoter regitm cloned into I he ///mcII site of pUC19. p.lSEy35 contains Ihe 
llhuMU-liamVU sublragment of the phsA promoter regitm of pJSEy2y cloned 
into //mdlll-Wnml ll-digcstcd pIJ2843 (7). 

Eazyme assays. Strepiomyces cultures were grown in 250-ml llasks containing 
50 ml of glutamic acid-.salts medium, 50 fig t)f thiostrcpton per ml as necessary, 
and 5 mM CuS()4 at 2X''C with shaking at 200 rpm. Cultures contained either 1% 
galactose, 1% glucose, or 0.5% galactose plus 0.5% glucose as carbon sources. 
The cultures were harvested 12 h after IniKulation. Mycelium was wa.shed in 100 
mM potassium phosphate (pll 7.5), suspended in a final volume of 2 ml of 
sample buller (19). and disrupted by .st mica lion. 

Catechol dioxygena.sc assays were performed and activities were determined 
spectrophotometrically as described previously (19, 54). Catechol dioxygenase 
.specllic activity was calculated as the rate of change in /1 ,75 per min per milligram 
of protein and converted to milliunils per milligram (45). Protein concentrations 
were determined with the bicinchoninic acid protein assay reagent kit from 
Pierce. The PI IS assay was performed as described previously (6) with 3-hy- 
droxyanthranilic acid as the substrate. 

Nucleotide sequence analysis. Sequential deletion clones from both ends t>f 
the plisA Sph\ fragment were obtained by exo nuclease lll-mung bean nuclease 
digestion with the exo nuclease lll-mung bean deletion kit from Stralagene Chm- 



ing Systems. The phsA Sph\ fragment was subcloned into pHluescripl SK** 
(Slratagene) modified, to contain w\Sph\ .site In the polyllnker, and the resulting 
rcccmibinant plasmids (pJSE^OO and pJSEyiO) were used to create deletion 
climes suitable for .sequencing. The nucleotide sequences (U' both DNA strands 
of the cloned phsA fragment were t)btalned by the dideoxy chain termination 
method (47). Single-stranded ON A was obtained wilh VCSMI3 as a helper 
phage, and the DNA was prepared as described previously (26). The sequencing 
reactions were performed basically as described for the 7-deaza-GTP Sequenase 
kit from United Slates Biochcmicals except that the extension and termination 
reactions were done at 50 and 70''C, respectively. The reactions were posi- 
terminaled at 70°C for 2.5 min by adding 2.5 U of 7m/ version 2.0 DNA 
polymera.se and 1 \x\ of lerminati(m mixture, both from United Slates Biochcmi- 
cals. DilTicult compression areas and pause sites were resolved by using dITP 
instead of deaza-GTP. The DNA .sequences were analyzed with the DNAsis 
program from Hitachi and the GCG program from the University of Wiscimsin. 

The Gen Bank accession number for the ,S'. uniibioiiais 1MKU.3720 PUS gene 
(jyftsA) is U04283. 

Primer exten.sion. In the printer extension experiments, a 24-hase oligonucle- 
otide primer. 5'-GATCTCGGTCTCCCGCGTCACCTC-3'. thai is located 52K 
bp downstream of the S'-Sphl site and is complementary to the phsA mKNA was 
used to reveal the transcriptional start point. End labeling of the 5'-terminus of 
the otigimuclcolide primer with the polynucleotide kinase react iim and the 
primer extension reaction were dime as described by Moran (42). RNA prepa- 
rati(m was as described previously (17) with the following m<idilicatiims. Myce- 
lium was collected on a Whatman no. 4 lilter disc by use of a vacuum line U) 
accelerate the filtration proces.s. The mycelium was quickly scraped olf the (liter 
into a universal bottle and resuspended in 5 ml of modilied KIrby mixture at 4''C 
(modified Kirby mixture consists of 1%^ (wt/vol] .sodium lriisopn>pylnaphthalene 
sultbnate 1 Eastman Chemicals], ()% (wt/vol] sodium -4-amlno salycillc acid [so- 
dium salt; BDI I], and 6% (vol/vi)lJ Tris-EDTA-bulfered phenol mixture, and all 
.solutions were made up in 50 mM Tris-llCl |pll S.3|). The conlenls were 
vorte.xcd with 10 g of 4.5- to 5.5-mm-diameter glass balls as vigorously as possible 
for at least 2 min. Three milliliters of phenol-chloroform mixture was added, and 
Ihe mixture was vtme.xed as descrihetl above. The homogenate was then trans- 
ferred to a polypnipylene tube (Falcon 2006) and centrifuged (10 min ai 12,000 
X fi and 4°C) to separate the phases. The aqueous layer was transferred to a fresh 
tube, and an additional 5 ml of phenol-chloroform mixture was added. The 
solutions were vortcxed thoroughly for 2 min and cenirifuged again as described 
previously to separate the phases, and this procedure was repeated until very 
little interphase material remained visible. t)ne-tenlh volume of 4 M sodium 
acetate (pi I 6.0), followed by an equal volume of i.sopropanol, was added to the 
aqueous phase. The solutions were mixed and left at -20''C lor 1 h. The nucleic 
acids were collected hy centrifugal ion at 12,000 X for 10 min, and the super- 
natant was discarded. The petlel was rinsed with abst)lute ethanol and vacuum 
dried. The pellet was resuspended in ISO jxl of distilled water (treated wilh 
diethyl pyrocarbonale) and 20 jil of lOx DNase bulfer (0.5 M Tris-l ICl [pll 7.S|. 
0.05 M MgCU) and transferred to an Eppendorf lube DNasc (RNa.sc-free: 
Sigma Chemical Co.) was added to a linal concentration of 30 jig/ml. The 
solutions were incubated at room leiiiperalure lor 30 min. An equal volume of 
phenol-chlorotbrm mixture was then added, and the .samples were mi.xed by 
vorlexing. The phases were .separated by centrlfugation in a mlcrtKcntrifuge, and 
the aqueous phase was transferred to a fresh tube. The aqueous phases were then 
extracted by adding an equal volume of chloroform. Ti)tal RNA was precipllated 
with 1/10 volume of 3 M sodium acetate (pll 6) and an etjiial volume of 
isopropanol for 2 h al -20*C, and the precipitate was collected hy centrifugallon. 
The RNA pellet was rinsed with 70% and then 100% ethanol, vacuum dried, 
resuspended in 100 ^,1 of distilled water, and stored at -70°C. The quantity of 
RNA was assessed hy spectrophotometry, and the quality was assessed by aga- 
rose gel electrophoresis. 



TABLE 1. Plasmids used or referred to in the present study 



Plasmid 



Description 



Source or 
reference 



pUC19 

pBluescripl SK"^ 

pIJ702 

pIJ25Ul 

pIJ2843 

pJSE900 and pJSE9in 
pJSE923 ■ 
pJSE929 



pJSE935 



Phagemid cloning vector (Stralagene); the vector was modiiied to contain an Sphl site in the 
polylinker 

The 2.3-kb phsA Sphl structural gene I'rom S. antihioiicus cloned into the Sphl site ol' pIJ7U2 
Strcptomyces low-copy- number promoter-probe vector 

The 2.3-kb phsA Sphl cloned in the Sphl site of pBlucscript SK"^ in two orientations 
pIJ2501 wilh an Xhal linker al the Pviil site of phsA 

The blunt-ended, ca. 235-bp BsrBl subfragment of the phsA promoter region, extending from 
position -106 to -fl35 relative to the transcriptional start site, cloned into the Hhtcll site of 
pUC19 

The ca. 265-bp Hindlll-BamHl subfragment of the phsA promoter region of pJSE929 cloned 
into Wmdlll-^flwiHI-digested pIJ2843 
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Determination of the amino-tcrniinal sequence of the PUS subunit. The :imi- 
no-lermimil sequences ot the cloned and nulive PUS proteins were determined 
al the Emory University Mierochcmical Facility and found to be idcnlieal. The 
sequence ol' the first 15 amino acids of the protein is Thr-Asp-Mel-Ilc-rdu-Ciln- 
Ser-Asp-Asp-Arg-Ile-Asp-Pro-Ile-Asp. 

Enzymes Jind reagents. Restriction endonuclcases were purchased I'roni Boe- 
hringer-Mannheim Corporation, Gibco BKL, and Promega Corporation. Call 
intestinal alkaline phosphatase, T4 DNA ligase, and avian reverse transcriptase 
were obtained Irom United Stales Bii>chcmical.s. Exonuclease III and mung bean 
nuclea.sc were t)blained Troni Slralagene. Sigma Chemical Co. supplied KNase. 
which was prepared as described previously (46). The 7-dcaza-dOTP Sequenase 
version 2.0 and Taq version 2.0 DNA polymerase kits were purchased Irom 
United Stales Biochemicals. |T--'^PidATP. |a-'-P|ddT, and a--''S-dATP were 
purchased Irom Duponl New England Nuclear Products and Amersham. KNasin 
was obtained from Promega. All ol' the chemicals were ol* reagent grade or the 
highest purity commercially uvailablc. 

RESULTS 

Nucleotide sequence analysis. A detailed restriction map of 
the phsA Sphl fragment constructed on the basis of the nucle- 
otide sequence is shown in Fig. 2, and the nucleotide sequence 
of the fragment is shown in Fig. 3. Analysis of the DNA 
sequence with the FRAME codon preference program (3) 
revealed a 1,932-bp open reading frame with 71.5% G + C 
content, matching the codon usage of Streptomyces spp. (Fig. 
4). The open reading frame presumably starts with a TTG 
codon at nucleotide 348 and encodes a deduced polypeptide of 
642 amino acids with a predicted M, of 70,223. Furthermore, 
the predicted initiator amino acid is only one position up- 
stream of the N-terminal amino acid obtained by protein se- 
quence analysis of purified PHS (the first 15 amino acids shown 
in Fig. 3; see Materials and Methods). Additional information 
on the putative Iranslational start was obtained by inserting an 
Xbal linker downstream of this region (Table 1, pJSE923). The 
inserted linker created stop codons in all three reading frames. 
When the resulting recombinant plasmid, containing the Xba\ 
linker in the Pvt/I site oi phsA, was used to transform S. livi- 
dans, PHS expression from phsA was completely abolished 
(data not shown). These results rule out the possibility that the 
cloned fragment activates a normally silent phsA gene in S. 
antihioticus, as has been observed for lividans (25, 37). Up- 
stream of the putative TTG start codon is the sequence 
GGGGG (Fig. 3, boxed), which may act as a ribosome binding 
site (48). A short stem-loop structure is located 4 bp down- 
stream of the phsA stop codon (Fig. 3, inverted arrows), but its 
ability to function in transcription termination is problematic 
because of its length. 

Primer extension analysis and identification of the putative 
phsA promoter. A 24-mer oligonucleotide primer, correspond- 
ing to sequences 530 bp downstream of the 5' Sphl site and 180 
bp downstream of the translational start codon (Fig. 3), was 
used in primer extension studies to locate the 5' end of the 
phsA transcript (Fig. 5). RNA templates were prepared from S. 
antibioticus and 5. lividans as indicated in the legend to Fig. 5. 



The transcriptional start point (tsp) of the phs message re- 
vealed by this analysis is located at the A residue which is 313 
bp downstream of the 5' Sphl site and 36 bp 5' to the trans- 
lation initiation codon. The transcription start point of the 
cloned phsA gene in S. lividans TK24 is the same as I hn I of the 
chromosomal gene in S. antibioiicus (Fig. 5). In addition, there 
is no difference in the tsp shown in the primer extension ex- 
periments using total RNA prepared from glucose- or galac- 
tose-grown cultures (data not shown). However, glucose-grown 
cultures contained less p/ii-specilic message than galaclose- 
grown cultures. This observation is consistent with earlier data 
suggesting that the decreased level of PHS observed in cultures 
grown on glucose as compared with that in galactose-grown 
cultures is due in part to an effect at the level oiphs transcrip- 
tion (20, 21). 

On the basis of primer extension studies, putative -10 and 
-35 promoter regions were located relative to the transcrip- 
tion start point (Fig. 3). There are also other interesting fea- 
tures which are located near the promoter region, including 
several sets of direct repeat sequences, two sets of inverted 
repeats, and two TNTNAN sequences (Fig. 3). These se- 
quences arc noteworthy because they may be involved in the 
catabolite control of the phsA gene (41). The function of these 
sequences will be examined in detail in subsequent studies. 

Confirmation of the presence of a functional promoter up- 
stream of the transcription start site was obtained by promoter 
probe cloning. In these experiments, a BsrBl fragment from 
phsA (see Fig, 2 and 3) was inserted upstream of ihc xylE gene 
in the promoter probe vector plJ2843 (7, 36). The resulting 
recombinant plasmid was used to transform S. antibioticus and 
S. lividans, and mycelial extracts were prepared after 19 h of 
growth of control and transformed cultures in liquid media. 
The results of catechol dioxygenasc assays of those extracts 
revealed that the untransformed strains contained negligible 
levels of enzyme activity, as was also the case for strains trans- 
formed with pIJ2843. In contrast, 5. antibioticus and S. lividans 
strains containing pJSE935, with the putative promoter frag- 
ment, showed significant levels of xylE activity (data not 
shown). Thus, the BsrBl fragment docs possess promoter ac- 
tivity, and the promoter probe results support the identifica- 
tion of the promoter region of phsA suggested by the sequenc- 
ing and primer extension studies. The use of pJSE935 in 
studies of glucose repression of phsA is described l^elow. 

Sequence comparisons with PHS sequence. The deduced 
amino acid sequence of PHS was compared with entries in 
protein databases provided by GenBank by use of the FASTA 
program. The sequence with the greatest homology to PHS 
was that of bilirubin oxidase from Myrothecium verrucaria (32). 
There is 26% identity and 45% similarity between the se- 
quences of PHS and the bilirubin oxidase protein. A lower 
homology (18% identity, 40% similarity) was found for the 
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GGCAtGCAGCAGTTCGAGCACATCATCGCCCACGAACTCCCGCTCCTGTACGACGACTTC 

GACCTGTCCGCCGAGGCGCGCG CGGCGATG GACAACTACGTGCCCGACCT^ 

CTCTCGGGCATCCT CAACTGqgATC(^CCG GGTGGACCGTTACAAGTCCGCGTACC^ 

AGCCGGGOXACGGGITCGTCCCGGACCGCTCTTCGGCCACTCCGCCCCCACACCTGCTG 

CTTTCCGGAClITCATATACGGAATAGGGCATTCAf^il^A^VCPCGCGAGGGGCGCCGGGCG 

TCTCATG GGGCAAGGGGAA CgACCGACfiGAACACfeGGGGtrGTTC GACCTT gAC 



LGA£GGAGTj 
I D "G 



;cie 



Sau3A 

ATCGAGCAGAGCGACGACCG GAJCGAC CCC 

I E 0 S D D R 

BsrBI 

CyGGAC GACGT'rCTGGCGAAGGAGCGGGAGCAGCyTArCGGCG CqCGG 
7 A D DVLAKEREQ A » P A "P » G E L T P 
TTCGCCGCGCCGTTGACCGTCCCGCCCGTCCTGCGGCCCGCTTCGGAC G^GGTGACGCGG 

FAAI'b'l'V PPVLRPASDEVTR 
GAGACCGAGATC GCCCTGCGCCCCACCTGGGTGCGCCTGCACCCGCAGCTCCCCCCCACC 

ETEIALRPTWVRLUPQLPPT 
CTGATCTGGGGCTACGACGGCCAGGTGCCGGGCCCCACCATCGAGGTCCGGCGrOGACAG 

LMWGYDGQVPGPTIEVRRCCf 
CGCGTCCGCATCGCCTGGACCAACCGCATCCCCAAGGGGAGCGAGTACCCGGTCACCTCC 

RVRIAW'I'NKl PKGSEYPVTS 
CTCGAGGTGCCCCTCCGCCCGCCGGGCACCCCGGCACCGAACACCGAACCGGGGCGCGGC 

VEVPLGPPGTPAPNTEPGRG 
GGCGTCGAACCCAACAAGGACGTCGCCGCGCTGCCCGCCTGGTCCGTCACCCATCTGCAC 

GVEPNKDVAALPAWSVTHLH 



CGCXSCGCAGACCGGCGGCGGCAACGACGGCTCGGCGGACAACGCCGTCGGCTTCGGCGAC 
GAQTGGGNDGWADNAVGFGD 
Etomain \^^mmm^ 

CCCCAGCTCTCCGAGTATCCGAACGACCACCAGGCGACCCAGTGGTGGTACCACGACCAC 
AQLS EYPNDHQATQWWYHDH 

""^Domain 2"^—^ 

GCCATGAACA'KyvCCCGGTGGAAOn'GATGGCGGGCCTGTAOOGCACCTACCTGGTCC^ 

AMNITRWNVMAGLYGTYLVR 
GACOACGAGGAGGACGCCCIWWCCTWCCTCCGGCGACCGGGAGATCCCGCTGCTGATC 

DDBEDAI. GLPSGDREI PLLI 
GCCGACCCCAACCTCGACAC(^3ACGAGGACGGCCGGei^AACGGACGGCTCCTGCACAAG 

ADRHL.DTDEDGRLNGRLLKK 
ACGGTGATCGTCCAGCAGTCCAACCCGGAGACCGGCAAGCXTGGTGTCCAT^CCGTTCTTC 

TVIVOQSNPETGKPVSIPFF 
GGCCCGTACACCACGGTCAACGGCCGGATCTGGCCGTACGCCGACGTCGACGACGGCTCG 

GPYTTVMGRIWPYADVDDGW 
TATCCGCTCCGGCTGGTCAACGCGTCCAACGCGCGCATCTACAACCTCGTCCTGATCGAC 

YRLRLVMASNARIYNLVLI D 
GAGGACGACCGGCCGGTGCCGGGCGTCGTCCACCAGATCGGCACCGACGGCGGACTGCTG 

EDURPUPGVVHOIGSDGGLL 
CCGCGCCCGCTGCCCGTCGACTTCGACGACACCCTCCCGGTGCTGAGCGCCGCCCCGGCC 

PRPVPVDFDDTLPVLSAAPA 
GAGCCOTTCCACCTGCTGGTCGACTTCCnCGCaCTCGGCGGCCGCCGGCTGCGGCTGGTC 

ERFDLLVDFRALGGRRbHLV 

BamHI 

GACAACGGGCCGGGCGCGCCCGCCGGGACGCCGGATCCGCTGGGCGGGGTGCGCTACCCC 
DKGPGAPAGTPDPLGGVRYP 



EVMEPRVRETCEEDSFALPE 
GTGCTCTCCGGGTCCTTCCGCCGGATGAGCCACGACATCCCGCACGGCCACCXX5CTGATC 

VLSGSFRRKSHDI PHGHRLI 
GTGCTGACCCCGCCCGGCACCAAGGGCTCCGGCGGCCACCCGGAAATCTGGGAGATGGCC 

VLTPPCTKGSGGHPEIWEMA 
GAGGTCGAGGACCCGGCCGACGTCCAGGTCCCCGCCGAGGGCGTCATCCAGGTCACCGGC 

E VEDPADVQVPAEGVIQVTG 
GCCGACGGCCGTACCAAGACGTACCGCCGTACGGCAGCGACGTTCAACGACGCTCTCGGC 

ADGRTKTYRRTAATFNDGLG 
TTCACCATCGGCGACGCCACCCACGAGCAG-1'OGACC'n'CC'l'CAACCrCTCGCCGAl'CCTC 

PTIGEGTHEQWTFLNLSPIL 
CACCCCATGCACATCCACCrGGCCGACTTCCAGGTCCTCGGCCGCGACGCCTACGACGCG 

HPMHIHLADPQVLGRDAYDA 

TCCGGCTTCGACCrCGCCCrCGGCGGCACCCGCACCCCGGTGCGGCTCGACCCGGACACC 
SGFDLAL. GGTRTPVRLDPDT 

CCGGTCCCGCTCGCCCCCAACGAGCTGGGACACAAGGACGTCTTCCAGGTGCCGGGCCCG 
PVPLAPNELGHKUVFQVPGP 

CAAGGGCTGCGCGTGATGGGCAAGTTCGACGGGGCGTACGGCCGGTTCATGTACCACTCC 
QGLRVKGKFDGAYGRFKYHC 
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GCCCTGAAGTTCGACCACGGCGGGGCGCACGGCGGCCACGGCGAGGGTCACACGGGCTGA 
ALKFDHGGAHGGHGEGKTG* 

CGCCCCqCCCGCqpGGCATGGC 7. 3 0 2 

FIG. 3. Niiclcolidc .sequence of the Sph\ rriigmenl containing the S. antihi- 
otiai.s phsA gene. Importanl restriction enzyme .sites arc indicated above the 
.sequence. The boxed region denotes the possible ribosomal binding site (Shine- 
Dalgarno sequence [S/D]). The potential stem-loop is Indicated by a pair oi" 
inverted arrows located toward the end of the .sequence. Potential -10 and -35 
regions have Unes above the sequence and arc di.scussed in the text. The location 
of the Ira ascription start point is indicated by an upward arrow and bold type, 
and the primer lor primer extension is shown by the long arrow under the 
nucleotide sequence around position 530. The direct and inverted repeat se- 
quences are indicated by arrows and numbered in pains (1 to S). The TNTNAN 
elements arc shown in bold type and arc located around 25.1 and .10.1 bp from the 
5' Sph\ site. Four potential copper binding domains are also indicated In the 
sequence (.solid bars). These .sequence data appear in the EMBL, GenBank, and 
DDBJ nucleotide sequence data libraries under accession number U 042X3. 



sequence of manganese-oxidizing protein rrom Leptothrix clis- 
cophora (8). Copper binding motifs oi" all three proteins are 
aligned in Fig. 6. All three proteins arc involved in oxidation 
reactions, but only PHS and bilirubin oxidase belong to the 
family of blue copper proteins (2, 12, 32). Sequence compari- 
son of PHS with bilirubin oxidase, manganese-oxidizing pro- 
tein, and several other blue copper proteins revealed the pres- 
ence of four regions in the sequence of the former protein 
corresponding to the potential copper binding domains found 
in the sequences of the blue copper proteins (Fig. 6). The 
finding of these copper binding domains confirms PHS as a 
blue copper protein (2, 12). This result is not at all surprising, 
since PHS has been shown to require copper for activity (2). 
The amino acid sequence of PHS contains consensus domains 
for the copper binding regions of the same types (I, II, and III), 
which were revealed by X-ray crystallography of ascorbate 
oxidase from zucchini (40). However, there are just two copper 
binding domains found in the manganese-oxidizing protein. 
We speculate that the copper binding domains are components 
of the catalytic sites of these enzymes. 

Expression uf the cluned phsA promuter is repressed hy 
glucose in 5. antihioticus. The production of PHS in S. antibi- 
oticiis was demonstrated some years ago to be subject to ca- 
tabolite control (13, 28). As has been mentioned, later studies 
suggested that the production of PHS was regulated at the 
transcriptional level (20, 21). In the present study, the eflects of 
glucose on the expression of the promoter active fragment 
cloned in pJSE935 was examined in S. antihioticus. Transfor- 
mants containing pJSE935 were grown on 1% galactose, 1% 
glucose, or a mixture of 0.5% galactose and 0.5% glucose. 
Catechol dioxygenase assays were performed on extracts of 
mycelium harvested 12 h after inoculation of the growth me- 
dia. PHS assays were performed on these same extracts. The 
results of these experiments, presented in Fig. 7, show that 
glucose represses the expression of the phsA promoter in 
pJSE935 in the presence or absence of galactose. Thus, the 
effects of glucose on the phsA promoter would seem to fit the 
classical definition of catabolile repression, which requires that 
expression of the relevant gene be inhibited when the organism 
in question is grown on the repressing and (relatively) nonre- 
pressing carbon sources simultaneously. It is significant that 
the PHS activity in the mycelial extracts exactly paralleled the 
xylE activity; PHS production was inhibited when the organism 
was grown on glucose alone or on galactose plus glucose (Fig. 
7). 

One possible mechanism for catabolile repression of phsA 
expression would involve the binding of a repressor protein to 
operator sequences in the promoter region of the gene. Such 
mechanisms have been suggested to explain glucose repression 
in other streptomycetes (for examples, see references 9 and 
51). To examine this possibility in S. aniihioticits, the effects of 
carbon source on PHS activity were measured in transformants 
containing pJSE923, in which the phsA gene is disrupted by an 
Xba\ linker. As controls, PHS activity was measured in un- 
transformed S. antihioticus and in transformants containing 
pIJ702 and pIJ2501. We reasoned that if the phsA promoter 
region possesses a repressor binding site, it might be possible 
to titrate the repiessor by cloning that site at high copy in S. 
antihioticus. However, the result of the experiment was the 
observation that transformants containing pJSE923 showed 
the same pattern of PHS expression when grown on galactose, 
glucose, or glucose plus galactose as did the wild-type strain 
(data not shown). Thus, although the Xhal linker effectively 
prevented expression of the cloned phs gene, the presence of 
the disrupted gene at high copy did not abolish glucose repres- 
sion of the endogenous phsA gene. 
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FIG. 4. Analysis of the DNA scc|uencu with the FRAME program (3) revealed u l,U32-hp open rcatting frame matching the codon usage o\' Sircptf>tuyccs spp. 



DISCUSSION 

In the present study, we have characterized the cloned PHS 
gene from S. antibioticus. The M,. ol' the PHS subunit deduced 
from the nucleotide sequence data is 70,223. This value differs 
from the apparent value of 88,000 estimated by sodium dode- 
cyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) 
(25). The explanation for the anomalous migration of this 
protein on SDS-PAGE is not clear. Previous studies have not 
revealed the presence of carbohydrate or other substances 
covalently associated with PHS, but other features of the pro- 
tein presumably cause It to migrate in an unexpected fashion. 
Two native forms of PHS, large and small (L and S), were 
reported previously to have M^s of 540,000 and 180,000 and to 
be composed of six and two PHS subunits, respectively (6). On 
the basis of the deduced of the PHS subunit, the corre- 
sponding values for L and S would be about 420,()()() and 
140,000, respectively. 

The results of promoter probe cloning and nucleotide se- 
quence analysis of the putative phsA promoter support the 
identity of the -10 and -35 regions and the transcriptional 
start point of phsA. Only a single start point was observed in 
the experiments illustrated in Fig. 5 and their replicates. It is 
possible that the tsp identified here is artifactual, but it is 
significant that the use of that start point identifies -10 and 
-35 regions with significant homology to the P2 promoter of 
the agarase gene (reference 5 and unpublished results). The 
-10 region, TCTCAT, of the phsA promoter showed more 
similarity to the -10 consensus sequence, TATAAT, of £. colt 
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FIG. 5. Mapping ol the 5' end ot the phxA mRNA. A [t-"I*1ATP end-labeled 
24-mer oligonucleotide (5'-GATCrCGGTCTCCCGCGCGTCACC-.3') was an- 
nealed to pfixA mRNA and extended with reverse transcriptase. The reaction 
products were separated on a sec|ucncing gel with a .sequencing ladder, generated 
hy the u.se ol" the same primer, to determine the Iran.scription .start site oi pJisA 
mRNA. RNA templates were from S. livichins TK24 (lane 1), S. imtibuHkus (lane 
2), and S. I'mdmus Iransl'urmed with pi J 2501 (lane 3). The arrow indicates the 
primer extension product that corresponds to iiiilialion ixomphsA. The .sequence 
on the lelt is the DNA region around (he apparent tran.scription start site lor 
phsA, indicated by an asterisk. Although the band corresponding to the extension 
product obtained with RNA from .S'. a/uihhiiats is lainl in the reproduction 
shown here (lane 2), it was clearly visible m the original autoradiograms. 
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FIG. 6. Alignment of the putative copper-hind iiig motifs of PI IS. sevenil hluc 
copper proteins, ;md mnnganese-oxidizing protein. Sequence identities helween 
1*1 IS antt the other blue copper proteins and manganese -oxidizing protein are 
boxed. The numbers to the IcH ol the motifs denote the positions in the corrc- 
.sponding protein sequences. The amino acid residues corresponding \o potential 
copper binding sites of three rec<igni/cd types (40) are shown as follows: type I, 
*1; type II. *2; type III. *3. Dashes represent gaps introduced to maximize the 
similarity. Protein sequences were from the following st)urccs: BO, Myroihecium 
vcmicaria bilirubin oxidase (32); CP, human ceruloplasmin (33); A. LA, Aspcn>il- 
his nulnlans laccase (I); N. LA, Neurosptmi cmssa laccase (14); C. AC), cucumber 
ascorbate oxidase (44): Z. AO. /.ucchini ascorbate oxidase (40); PC, polar plas- 
tocyanin (43); AZ, Alatlitienes iknitrijimns a/urin (43); MO, Lepfoihrix ilLsa/- 
phnm manganese -oxidizing protein (S). 



(15, 16) than to the -10 consensus sequence, TAGGAT, of 
Streptomyces promoters (48). The -35 region of the putative 
phsA promoter was not strikingly similar to the -35 consensus 
sequence of either coli or Streptomyces promoters (Fig. 3) 
(see references 15, 16, and 48). Overall, the phsA promoter is 
not strongly homologous to any promoters for other antibiotic 
genes from Streptomyces spp. (48). However, recent studies do 
suggest similarities to the P2 promoter of the agarase gene 
(clagA) from Streptomyces coelicolor (reference 5 and unpub- 
lished data). Preliminary data also suggest that the phsA pro- 
moter is recognized by an alternative a factor, (34). The 
role of this ct factor in S. autibioticits will be described in a 
subsequent publication. 

One noteworthy feature of the phsA sequence is the pres- 
ence of several sets of direct and inverted repeats near the 
promoter region (Fig. 3). This is especially interesting since 
some direct and inverted repeat sequences have been reported 
to be involved in the regulation of gene expression in strepto- 
mycetes. For example, repeated sequences have been impli- 
cated in the catabolite control of Streptomyces genes, including 
the chitinase genes ot Streptomyces plicatus (9), IhcgalPJ pro- 
moter of the galactose operon of S. lividans (41), and a-amy- 
lase promoters of Streptomyces limosiis (51). None of the phsA 
direct or inverted repeat sequences is strikingly similar to the 
repeat sequences in the studies described above. However, it is 
possible that repeat motifs are a common feature of the re- 
gions involved in catabolite repression of streplomycete genes. 
The phsA sequence also contains two TNTNAN elements, 



located within the -10 region and upstream of the phsA pro- 
moter region (Fig, 3), TNTNAN hexamers were suggested to 
play a role \ngalPJ regulation in S. lividans (41), 

The predicted amino acid sequence of the PHS subunit 
resembles that of proteins belonging to the blue copper protein 
family. Like most members of this group (32), the sequence of 
the PHS subunit contains four consensus domains (1 to 4) that 
are presumed to bind the copper ligands (Fig, 6). Domains 1 
and 2 are located at the N-terminal portion of the protein, 
whereas domains 3 and 4 are nearer the C terminus. Even 
though manganese-oxidizing protein does not belong to the 
blue copper protein family, similarities were observed in the 
copper binding domains 1 and 2 between PHS and the man- 
ganese-oxidizing protein (Fig. 6). In spite of the diverse distri- 
bution of these proteins and their utilization of very different 
substrates, they all use molecular oxygen in the reactions they 
catalyze. Although the active sites of these enzymes have not 
been characterized, the consei-vation of the copper binding 
sites strongly suggests their involvement in substrate recogni- 
tion and catalysis. 

In this study, we provided evidence for the legulation of the 
phsA promoter by catabolite repression. Thus, growth of S. 
antihioticus containing the cloned p/isA promoter on glucose or 
glucose plus galactose led to a significant inhibition of xylE 
expression from pJSE935 as compared with that of cultures 
grown on galactose alone (Fig. 7). An identical pattern of 
inhibition was observed for PHS activity in the same cultures. 
There are several important implications of this result. First, 
the data strongly suggest that the sequences required for ca- 
tabolite repression are contained within the BsrBl fragment 
cloned in pJSE935. This fragment lacks the repeats 1, 2, and 8 
of Fig. 3, Thus, those sequences are presumably not required 
for catabolite repression. Second, it is clear that whatever the 
mechanism of catabolite repression in 5. antihioticus, the rel- 
evant machinery can act simultaneously on the endogenous phs 
promoter and on the cloned sequence, since PHS activity par- 
allels. ry/E activity in the experiments illustrated in Fig. 7. With 
regard to that mechanism, we presented evidence above that it 
may not involve a simple interaction between an operator and 
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xylE PHS xylE PHS xylE PHS 

FIG. 7. Etlects of carbon source on the expression of the pIisA promoter in S. 
aniibioticus. Trans forma nts containing pJSEi)35 were grown on galactose, glu- 
cose, or glucose plus galaelo.se as described in Materials and Methods. The 
ligures shows the results of catechol dioxygena.se and PI IS assays of extracts of 
mycelium harvested 12 li after inoculation. Results represent the averages of 
three replicates. The values obtained for extracts grown on g;ilactosc (42.4 ± 2.1 
mU/mg of protein for catechol dioxygenase and 7.S.6 ± .S.3 U/mg of protein lor 
PHS) were arbitrarily .set at 100 for purposes of presentation. 
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a repressor as it was not possible to release expression of the 
endogenous phs gene from catabolite repression by cloning the 
disrupted gene at high copy. It is possible, of course, that the 
repressor binding site involves sequences that were disrupted 
by the insertion of the Xhcil linker. It should be possible to 
distinguish between these possibilities and to learn more about 
the mechanism of catabolite repression of phsA expression by 
gel mobility shift assays. 
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We constructed a library of synthetic promoters for Lactococcus lactis in which the known consensus 
sequences were kept constant while the sequences of the separating spacers were randomized. The library 
consists of 38 promoters which differ in strength from 0.3 up to more than 2,000 relative units, the latter among 
the strongest promoters known for this organism. The ranking of the promoter activities was somewhat 
different when assayed in Escherichia coli, but the promoters are efficient for modulating gene expression in this 
bacterium as well. DNA sequencing revealed that the weaker promoters (which had activities below 5 relative 
units) all had changes either in the consensus sequences or in the length of the spacer between the —35 and 
-10 sequences. The promoters in which those features were conserved had activities from 5 to 2,050 U, which 
shows that by randomizing the spacers, at least a 400-fold change in activity can be obtained. Interestingly, the 
entire range of promoter activities is covered in small steps of activity increase, which makes these promoters 
very suitable for quantitative physiological studies and for fine-tuning of gene expression in industrial biore- 
actors and cell factories. 



Metabolic engineering has promising perspectives with re- 
spect to improving the properties and performances of micro- 
organisms used as industrial bioreactors, as cell factories, and 
in food fermentations. The importance of tuning gene expres- 
sion in this context, i.e., to perform metabolic optimization 
rather than massive overexpression or gene inactivation, is now 
far more appreciated. However, the more subtle approach of 
metabolic optimization is hampered by the lack of proper 
expression systems for tuning gene expression in many micro- 
organisms. Also, the fundamental understanding of a biologi- 
cal system through metabolic control analysis (5, 10) requires 
the tuning of enzyme activities in order to calculate the so- 
called control coefficients. For some organisms, expression sys- 
tems that allow for changing gene expression for scientific 
purposes and for a limited set of experimental conditions have 
been developed. Thus, for Escherichia coli, the lac system, the 
cl-regulated lambda pjpi,, and many derivatives of these sys- 
tems have been widely applied, and such systems have also 
been adapted for use in other organisms (for a recent review, 
see reference 12). With respect to changing steady-state gene 
expression, these systems can sometimes be difficult to apply, 
particularly when it comes to changing gene expression on an 
industrial scale. Besides, in most food fermentation processes, 
the addition of chemicals as inducers of gene expression or the 
changing of other process parameters is not acceptable; in such 
cases, there are virtually no expression systems available for 
tuning gene expression and thus for performing accurate met- 
abolic optimization. 

Lactic acid bacteria are widely used in food fermentation, 
e.g., cheese and yoghurt production, but besides lactic acid, 
these bacteria excrete a spectrum of organic compounds. Some 
of these are desirable with respect to the development of 
texture and flavors or for bioconservation purposes, and some 
are undesirable for similar or different reasons. The lactic acid 
bacteria are therefore obvious candidates for attempts to op- 



* Corresponding author. Mailing address: Department of Microbi- 
ology, Technical University of Denmark, Building 301, DK-2800 Lyn- 
gby, Denmark. Phone: 45 45252510. Fax: 45 45932809. E-mail: prj@im 
.dtu.dk. 



timize the pattern of formation of these compounds for specific 
applications. But the experimental tools for manipulating gene 
expression are not well developed for these bacteria. An ex- 
ception is the nisin-inducible system, developed recently by de 
Ruyter et al. (2). This system appears to be well suited for 
inducing gene expression in Lactococcus lactis by adding the 
antibiotic nisin (which is accepted as a food additive). A ques- 
tion that perhaps needs to be addressed in this context is 
whether the nisin expression system is also suitable for achiev- 
ing a steady level of gene expression. In addition, for effective 
metabolic optimization, it is often necessary to optimize the 
expression of a number of genes, which is not feasible with the 
systems developed so far. 

Here we describe a method for tuning steady-state gene 
expression in L lactis. We overcome many of the limitations 
discussed above by using libraries of synthetic promoters which 
cover a wide range of promoter activities and show that the 
strength of prokaryotic promoters can be modulated by ran- 
domizing the spacer sequences that separates the consensus 
sequences. The system is food grade and well suited for use in 
industrial bioreactors and food fermentation processes. In ad- 
dition, the system should be applicable to a broad range of 
biological systems. (Potential commercial users should be 
aware that the approach for obtaining the synthetic promoters, 
as well as the promoter sequences, were filed for patent world- 
wide [7a]). 

MATERIALS AND METHODS 

Bacterial strains and plasmids. The E. coli K-12 strain BOE270 (1) is highly 
competent with respect to transformation and was derived from strain MT102, 
which in turn is an fisdR derivative of strain MClOOO [araDl39 ^{ara-leit)7679 
galU gatK A{!ac)I74 rpsL thi-l (la))]. BOE270 was used for studying promoter 
activities in £. coli as well as for cloning purposes and propagation of pi asm id 
DNA in E. coli. The plasmid-free L. lactis subsp. cremoris strain MG1363, which 
does not express f^-galactosidase activity (4), was used for studying promoter 
activities in L. lactis. 

The promoter cloning vector pAK80 (7) was used for cloning the synthetic 
promoters DNA fragments. pAK80 is a shuttle vector for L. lactis and E. coli, 
conferring erythromycin resistance to the host cells. The vector carries the 
promoter less lacL and lacM genes from Leuconostoc lactis (which codes for 
p-galactosidase enzyme activity). It contains a multiple cloning site for the 
insertion of DNA fragments harboring putative promoter signals, just upstream 
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FIG. 1. Strategies used for cloning synthetic promoter fragments into the promoter cloning vector pAK80. (a) Double-stranded DNA fragments carrying putative 
promoter activities, (b) Restriction map and schematic representation of the relevant parts of the promoter cloning vector. The stippled and solid lines show the 
strategies used for cloning pCPl through pCP29 and pCP30 through pCP46, respectively, (c) Restriction map of clones pCPI through pCP29. (d) Restriction map of 
clones pCP30 through pCP46. Note that a number of clones have been subject to cloning artifacts and thus may have a slightly different restriction map. Bl, liamHl, 
AU,Aft\l Ss, Sspl, N, Nsi\ {Pst\ compatible); Nr, Nm]\ Sc, Seal; HII, HincU; P, Psl]\ PII, PvuW; E, EcoRl, Sa, Sad, Xh, Xhol\ Bll, Bgl\\\ Sm, Smal, Xb, Xha\ (not 
drawn to scale). 



the promoterless lacL and lacM genes from Leitconostoc lactis. Together, the 
lacL and lacM genes codes for a p-galactosidase. 

Ii:nzymes. Restriction enzymes, Klenow DNA polymerase, calf intestine phos- 
phatase, and T4 DNA ligase were obtained from and used as recommended by 
Pharmacia and New England Biolabs. 

Oligonucleotides. Oligonucleotides were obtained from Hobolth DNA Syn- 
thesis (Hiller0d, Denmark). 

Sccond-DNA-strand synthesis. The single-stranded promoter oligonucleotides 
were converted to double-stranded DNA, using a lO-bp oligonucleotide (5'-CC 
GAATTCAG) complementary to the 3' end of the promoter oligonucleotide as 
primer for the second-strand synthesis by the Klenow fragment of DNA poly- 
merase I. 

Cloning of synthetic DNA fragments into the promoter cloning vector pAKSO. 
Two different cloning strategies were used (Fig. I). In strategy A, the mixture of 
DNA fragments was digested with two restriction enzymes, HincW and Ssp\, and 
pAK80 was digested with Smai. In strategy B, the mixture of DNA fragments was 
digested with two restriction enzymes, BamH\ and Psi\, and pAK80 was digested 
with BglW and Psi\. In both strategies, the promoter fragments were then ligated 
to the compatible vector fragments. The ligation mixtures were then transformed 
into Ca-^* -competent cells (13) by using a standard transformation procedure 
(13), and the transformation mixture were plated (at SO'C) on LB plates con- 
taining erythromycin (200 ng^ml) and 5-bromo-4-chloro-3-indolyl-p-D-galacto- 
pyranoside (X-Gal; 100 )xg/ml). A total of 150 erythromycin-resistant transfor- 
mants were obtained; all were white initially, but after prolonged incubation (up 
to 2 weeks at 4''C), a number had become blue to various extents. Later, we 
discovered that the development of blue color from E. coli colonies (but not 
L. lactis colonies) expressing lacLM is greatly enhanced by adding 1% glycerol to 
the transformation plates (data not shown). Plasmids were isolated from these 
blue colonies, and it was confirmed by restriction enzyme analysis that most of 
these clones had promoter fragments inserted in the multiple cloning site of 
pAKSO, in the orientation that would direct transcription into the p-galactosidase 
gene (lacLM). The 46 colonies isolated had become blue to various extents; 29 
from cloning strategy A (containing plasmids pCPl through pCP29) and 17 from 
strategy B (containing plasmids pCP30 through pCP46) were picked for further 
analysis. The two weakest promoter clones, pCP3I and pCP43, did not contain 
a promoter fragment, and four promoter clones, pCPI8, pCP19, pCP33, and 
pCP44, turned out to be identical to pCP27, pCP22, pCP35, and pCP45, respec- 
tively. Indeed, the activities of these sets were almost identical, which also 
demonstrates the reproducibility of the assay used here. The chances that two 
identical sequences would have arisen by coincidence during the oligonucleotide 
synthesis is of course negligible, and these four clones must therefore be the 
result of a cell division that took place after the plasmids were transformed but 
before the cells were plated. 

Transformation oTL. lactis. Cells of L. lactis subsp. cremoris MG1363 (4) were 
made competent by growth overnight in GM 1 7 medium containing 2% glycine as 
described by Holo and Ness (6). Plasmid DNA from the 46 clones described 
above was then transformed into these cells by electroporation (6). The cells 
were allowed to regenerate in SGMI7 medium for 2 h and then plated on SR 
plates containing erythromycin (2 M-g/ml) and X-Gal (100 )xg/ml). 

p-Galactosidase assay. The assay was done as described by Miller (14) and 
modified by Israelsen et al. (7). Cultures carrying the plasmid derivatives of 
pAK80 were grown in rich medium overnight at 30''C. The medium used for 



L. lactis was M17 medium supplemented with erythromycin (2 fxg/ml) and \% 
glucose; for E. coli, LB medium supplemented with erythromycin (200 jig/ml) 
was used. The results presented are averages of measurements of the activities of 
at least three individual cultures of each clone. The standard errors were less 
than 30% for E. coli activities and less than 20% for L lactis activities. Aliquots 
of 25 to 100 \l\ of the cultures were used in the p-gulactosidase assay except in 
the case of the weakest promoter clones, where up to 2 ml of culture was 
concentrated and used in the assay. 

RESULTS 

The purpose of this work was to generate a library of syn- 
thetic constitutive promoters as a tool for genetic engineering 
of L. lactis. The promoters should cover a wide range of pro- 
moter activities, in small steps of activity changes, so that they 
would be applicable to quantitative physiological studies and 
for metabolic optimization. The following strategy was used: 
(i) design and synthesize a degenerated oligonucleotide se- 
quence that encodes consensus sequences for L, lactis promot- 
ers, separated by spacers of random sequences; (ii) convert this 
mixture of oligonucleotides to double-stranded DNA frag- 
ments, using DNA polymerase and a short oligonucleotide 
primer complementary to the 3' end of the degenerated oli- 
gonucleotide; and (iii) clone this mixture of DNA fragments 
into a promoter probing vector. The idea behind this strategy 
is that even though the consensus sequences should be impor- 
tant elements of an efficient promoter, the context in which the 
consensus sequences are located may modulate the strength of 
the promoters to some extent. 

Design and construction of synthetic promoters for L, lactis, 
A considerable number of promoters have been cloned and 
sequenced from L, lactis (see the review by de Vos and Simons 
[3]). From these data, we extracted extended consensus se- 
quence motifs for L. lactis promoters (Fig. 2A). The Pribnow 
box or the -10 sequence TATAAT and the -35 sequence 
TTGACA, known to be present in many prokaryotic promot- 
ers, are also well conserved for L. lactis. In addition, the se- 
quence TG is often found J bp upstream of the - 10 sequence; 
it is also possible to determine a consensus sequence for the 4 
bp immediately upstream of the -35 motif, ATTC. Nilsson 
and Johansen (16) found well-conserved sequences among 
promoters of the rRNA operons: AGTTT at position -44 and 
GTACTGTT at positions +1 to +8. In addition to these mo- 
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5' CATNNNHN AGTT TATTC TTGACA NHNMHHHHHKHMNHTGR TATAAT AHNWHA CTACTGTT 3' 
-44 -35 -15 -10 U 



BantHl Sspl Nsil 

1 ..... . 

S'CGGGATCCTTAAGAATATTATGCATNHHHN AGTT TATTC TTGAC AHHNHNHNNNNNMHMT 

-44 -35 -15 



HincM PvuiT 



GR TATAAT ANNWHA GTACTGTT AACTGCAGCTGAATTCGG 3' 
-10 a 



FIG. 2. Oligonucleotide sequence used for the generation of a libriiry of 
synthetic promoters for L. lactis. (A) Consensus sequence for L. hctis promoters 
derived from data published in the Mteniture. N = 25% each A, C, G, and T; R = 
50% each A and G; W = 50% each A and T. (B) The design of the oligonu- 
cleotide. The sequence contains a number of recognition sequences for restric- 
tion endonucleases, for use in the subsequent cloning strategy. Note that the 
sequence from positions +1 to +8, which is a putative stringent response site, 
can be deleted in the cloning process if necessary. See text for further details. 



tifs, two semiconserved base pairs were included, R (= A or G) 
upstream of the -10 sequence and W (=A or T) at position 
-3. Based on these data, we designed an oligonucleotide 
which also encodes recognition sites for multiple restriction 
enzymes (Fig. 2B). This mixture of oligonucleotides was con- 
verted to double-stranded DNA fragments, using a short 
primer complementary to the 3' end. Finally, the resulting 
double-stranded DNA fragments, encoding potential pro- 
moter structures, were cloned into the polylinker on the pro- 
moter probe vector, pAK80 (7), upstream of the promoterless 
p-galactosidase gene, using E. coli as a host; this resulted in 
plasmids pCPl through pCP46. 



Activities of the synthetic promoters in L. lactis, Plasmids, 
pCPI through pCP46 were then transformed into L. lactis 
subsp. cremoris MG1363. The different plasmids gave rise to 
colonies exhibiting very different intensities of blue on plates 
containing X-Gal. The specific activities of p-galactosidase in 
liquid cultures of these clones were then determined (Fig. 3) 
and found to vary from 0.3 Miller unit, or from slightly above 
the activity found with the cloning vector pAK80 without any 
insert, to up to more than 2,000 Miller units. Together, the 
promoters covered 3 to 4 logs of promoter activities in small 
steps of activity change. 

Sequence analysis of the CP promoters. A very interesting 
point is the molecular basis for the dilTerences in strength of 
the CP promoters, and we therefore took on the task of se- 
quencing the promoter clones. Eighteen clones were perfect in 
the sense that they had the DNA sequence that was specified 
by the oligonucleotide (Fig. 4). The activities of these 18 pro- 
moter clones covered, in small steps of activity change, a 50- 
fold range of activity, from 34 up to 1,800 Miller units. Four of 
the CP promoters had a t6-bp spacer between the -35 and 
-10 sequences instead of the 17 bp specified in the oligonu- 
cleotide sequence, and the activities carried by these four 
clones were weak, ranging from 0.7 to 12 Miller units. Four 
clones had base pair changes in the -35 sequence, and two had 
base pair changes in the - 10 sequence; those clones also had 
rather weak activity (0.3 to 69 Miller units). 

Some clones had t-bp deletions or a base pair change out- 
side the -35 to -10 region or have been subject to other 
cloning artifacts. However, the activities of these promoter 
clones were all within the range covered by the perfect clones, 
i.e., activities from 58 to 2050 Miller units, which indicates that 
in this case, consensus sequences outside the -35 to -10 
sequence are of little importance with respect to determining 
the promoter strength. 
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FIG. 3. Library of synthetic promoters for L. lactis. Promoter activities (Miller units) were assayed from the expression of a reporter gene (lacLM) encoding 
j3-gaIactosidjise transcribed from the different synthetic promoter clones on the promoter cloning vector pAK80. The patterns of the data points indicate which 
promoter clones contain errors in either the -35 or the - 10 consensus sequence or in the length of the spacer between these sequences. 
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Promoter Spec, p-gal activity 

L. lactis E. coli 

01 i ffoseq CATNNNNNAOTTTATTCTTGACA^nS^^I^n^^^N^n^^^INNNT 

CP24 CATGGGTAAGTTTATTCTTCACACTATCTGGGCCCGATGGTATAATAAGTGACTACTQTT 0.3 3.1 

CP18=CP27 CATTTTGCAGTTTATTCTTGACATTGTGTGCTTCGGGTO- TATAATA-CTAAOTACTOTT 0.7 0.2 

CP37 CATCATTAAGTTTATTCTTCACATTGGCCGGAATTGTTO-TATAATACCTTAGTACTGrr 2.2 16 

C P 1 7 CATTCTGGAGTTTATTCTTGAC -CGCTCAGTATGCAGTOGTATAATAGTACAOTACTOTT 2.7 10 

CP2 CATTTGCTAOTTTATTCTTGACATGAAGCGTGCCTAATGGTATAITACTTGAGTACTGTT 4.9 2.8 

CP4 GATGTTTTAOTTTATTCTTGACACCGTATCGTGCGCGTGATATAATCGGGATCCTTAAGA 5.1 1.3 

CP44=CP45 CATCGGGTAOTTTATTCTTGACA^TTAAGTAGAGCCTOATATAATAGTTCAOTACTOTT 12 34 

CPl CATACCGGAGTTTATTCTTGACAGTTCCACCTCGGGTTOATATAATATCTCAOTACTOTT 3 4 3.1 

CP19=CP22 CATCGCTTAGTTP- TTCTTGACAGGAGGGATCCGGGTTOATATAATA-GTTAGTACTCTT 5 8 3.3 

CP34 CATCGCGAAOTTTATTCTTCACACACCGCAGAACTTGTOGTATAATACAACAOTACTGTT 59 8.4 

CP2 0 CATGGGTGAGTTTATTCTTOACAGTGCGGCCGGGGGCTGATAT£ATAGCAGAGTACTATT 6 0 22 

CPU CATAAGTGAGTTTATTCTTCACCCGGACGCCCCCCTTTGATATAATAAGT-AGTACTGTT 69 1.4 

C P2 6 CATTCTACAGTTTATTCTTGACATTGCACTGTCCCCCTOGTATAATAACTATACATGCAT 72 10 

CP3 CATCCTGTAGTTTATTCTTOACACAAGTCGTTAGCTGTGGTATAATAGGAGACTACTCTT 74 0.9 

CP14 CATGACGGAGTTTATTCTTCACACAGGTATGGACTTATGATATAATAAAACAGTACTOTT 81 0.3 

CP13 CATGCTTTACTTTATTCTTGACAAAACCACCAGCTTTTGGTATAATACGTGAOAACTGTT 84 1.0 

C P 4 0 CATAG AAC AOTTTATTCTTOACATTG AATAAG AAGGCTOATATAATAGC - C ACTACTGTT 104 19 

CPS CATTCTTTAGTTTATTCTTGACAAACGTATTGAGGACTQATATAATAGGTGAGTACTOTT 146 1.2 

CP28 CATGGGGCCGTTTATTCTTGACAACGGCGAGCAGACCTGGTATAATAATATAGTACTGTT 161 0 . 9 

CPIO CATGGCTTAGTTTATTCTTQACAGGGTAGTATCACTGTOATATAATAGGACAGTACTGTT 181 2.5 

CP3 2 CATACGGGAOTTTATTCTTOACATATTGCCGGTGTGTTOOTATAATAACTTAGTACTGTT 214 60 

C P 3 0 CATGAC AGAGTTTATTCTTOACAGTATTGGGTTACTTTOGTATAATAGTTGAGTACTGTT 22 8 42 

C P 9 a CAT AGTCTAGTTTATTCTTGACACGCGGTCC ATTGGCTOGTATAATAATTTAGTACTGTT 243 56 

C P 3 8 CATAG AG AAOTTTATTCTTGACAGCTAACTTGGCCTTTGATATAATACATGAGTACTOTT 244 92 

CP4 6 CATGATGTAGTTTATTCTTGACACTGAGAGGGCCTCTTOATATAATAGTTGACTACTGTT 256 33 

C P2 3 CATGTAGGAOTTTATTCTTGACAGATTAGTTAGGGGGTOGTATAATATCTCAOTACTOTT 257 2.6 

C P3 9 CATTGCGAAGTTTATTCTTGACAGTACGTTTTTACCATOATATAATAGTATAGTACTCTT 266 0.5 

CP33=CP35 CATGTTGGAGTTTATTCTTOACATACAATTACTGCAGTGATATAATAGGTGAGTACTCTT 298 7.0 

C P 1 5 CATTACGTAGTTTATTCTTGACAGAATTACGATTCGCTGGTATAATATATCAGTACTOTT 3 22 1.5 

CP29 CATCGGTAAO-TTATTCTTGACATCTCAGGGGGGACGTGGTATAATAACTGAGTACTGTT 3 99 1.4 

C PI 2b CATATACAAGTTTATTCTTGACACTAGTCGGCCAAAATGATATAATACCTGAGTACTGTT 419 101 

C P4 1 CATCCGC AAOTTTATTCTTGACAGCTGAATGTAGACGTGGTATAATAGTTAAGTACTOTT 624 18 

CP16 CATTGTGTAGTTTATTCTTGACAGCTATGAGTCAATTTGCTATAATA--ACAGTACTCAG 627 0 . 3 

C P 4 2 CATTCGTAAGTTTATTCTTQACACCTGAGATGAGGCGTOATATAATAAATAAGTACTOTT 680 7.2 

C P7 TATGCGGTAGTTTATTCTTGACATGACGAGACAGGTGTOOTATAATGGGTCTAGATTAGG 1021 13 4 

C P6 CATGTGGGAOTTTATTCTTGACAC AGATATTTCCGGATOATATAATAACTGAOTACTOTT 1876 2 80 

CP25 C-TTTGGCAOTTTATTCTTGACATGTAGTGAGGGGGCTOOTATAATCACATAOTACTGTT 2050 528 



FIG. 4. Sequence of the area from positions -52 to +8 (relative to the putative transcription initiation site) of the synthetic promoter clones pCPI through pCP46. 
The clones are ordered according to strength. Matches to the oligonucleotide consensus sequence (given at the top) are in boldface. Errors in the -35 or -10 consensus 
sequence and deletions in the spacer between these sequences are underlined. Two clones, CP9 and CP 1 2, had two promoter fragments inserted in tandem, a (upstream 
fragment) and b (downstream fragment). In these cases, only one of the two tandem promoters was perfect; data for these promoters are shown, p-gal, p-galactosidase. 



Regulation of promoter activities. The synthetic CP promot- 
ers were designed to be constitutive. To test this experimen- 
tally, the expression in exponential growth phase and station- 
ary growth phase was measured for a selection of the promoter 
clones. We found that the specific activity of p-galactosidase 
was two- to fourfold higher in the stationary-phase cultures 
than in the exponential-phase cultures (data not shown). How- 
ever, the copy number of the vector used in these studies has 
been shown to increase approximately threefold in the station- 
ary phase (11), which demonstrates that the CP promoters are 
indeed quite close to being constitutive under these conditions. 

Activities of the synthetic promoters in E. coli. Another 
interesting point is whether the promoters are functional in 
other organisms, and if so, whether the relative strength of the 
promoters would be dependent on the organism. As described 
above, the promoter cloning vector, pAK80, that we used here 
for construction of the synthetic promoters also replicates in 
E. coli; indeed, the promoter clones were first isolated in E. 
coli. We could therefore measure the activities of the synthetic 
promoters also in £. coli (Fig. 5). The promoter strength was 
also highly variable for the individual promoters in this organ- 
ism, and we found that the promoters covered activities from 
0.2 to 500 Miller units. In this case also, the activity increased 
in small steps. 

The absolute values of p-galactosidase units measured in 
E. coli were lower on average compared to L. lactis; this was 
probably a consequence of a low efficiency of translation of the 
lacL and lacM genes in E. coli, since these genes and their 
ribosome binding sites originate from the gram-positive bacte- 
rium Leuconostoc mesenteroides. When some of the strongest 
promoters were cloned into a promoter cloning vector de- 
signed for E. coli, the promoters turned out to be quite strong 
(data not shown). 

Figure 6 shows a plot of activity of the CP promoters in 



L. lactis and coli. The strengths of the individual CP pro- 
moters in the two organisms correlate somewhat but not very 
well: some promoters which were quite strong in L. lactis were 
relatively weak in E. coli, and vice versa. Moreover, the pattern 
that we observed in L. lactis, i.e., that the relatively strong 
promoters were the perfect ones, did not hold true for E. coli: 
here the promoters which had either an error in the consensus 
sequence or a shorter spacer were relatively strong. 

DISCUSSION 

We have constructed a library of synthetic promoters that 
differ in strength over 3 to 4 logs of activity, and this range of 
activity is covered by small steps of activity increase. Moreover, 
some of the promoters that resulted from this random ap- 
proach turned out to be quite strong. 

The fact that the library of promoters covered such a wide 
range of activities was somewhat surprising to us; the under- 
lying idea behind the construction of the CP promoters was 
that the context of the consensus sequences (the spacers) 
would play a role in modulating the strength of a promoter, 
rather than changing the activity over several logs of activity. 
Indeed, much of that variation (below 5 Miller units) was 
probably a consequence of the accidental introduction of mu- 
tations in the consensus sequences and in the length of the 
spacer regions. In contrast, the strong promoters in L lactis 
(those having activities higher than 100 Miller units) were all 
perfect with respect to the consensus sequence and spacer 
length. But even when we confine our analysis to these pro- 
moter clones, we find 400-fold variation in promoter activity, 
still in small steps of activity increase, which demonstrates that 
the context in which the consensus sequences are embedded 
(i.e., the spacers) clearly is important for promoter strength. 

The ranking of the promoters depended on the organism in 
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FIG. 5. p-G:ilactosichise activities of the CP promoters in E. coli. The promoter activities were assayed from the expression of a reporter gene (lacLM) encoding 
p-galactosidase transcribed from the different synthetic promoter clones on the promoter cloning vector pAK80. The patterns of the data points indicate which 
promoter clones contained errors in either the -35 or the - 10 consensus sequence or in the length of the spacer between these sequences. See text for further details. 



which they were measured, possibly because the ct factor-RNA 
polymerase complexes that recognize these promoters have 
different structures in the two organisms due to differences in 
amino acid sequences. The fact that E. coli accepted some of 
the less perfect CP promoters as relatively strong promoters 
could indicate that E. coli is more promiscuous with respect to 
promoter structure than L lactis. This makes some sense con- 
sidering the composition of the L. lactis genome: the AT con- 
tent is 65%, which is much closer to the base composition of 
the -35 and -10 consensus sequences. These sequences are 
therefore more likely to accidentally occur in L. lactis^ and a 
stricter requirement for promoter sequences might therefore 
be expected for this organism. 

The process of transcription initiation consists of several 
events (reviewed in reference 17). First, recognition and bind- 
ing of the CT factor-RNA polymerase complex to the promoter 
region takes place (closed complex formation). Subsequently, 
there is local melting of the DNA double helix (open complex 
formation), possibly assisted by local negative DNA supercoil- 
ing. Finally, the binding between the cr factor-RNA polymerase 
complex and the promoter area must dissociate and clear the 
promoter area, so that another initiation complex may form. 
From this model, it is clear that efficient binding between the 
CT factor-RNA polymerase complex and the promoter area 
does not guarantee a strong promoter; promoter strength must 
be a compromise between binding, melting, and clearance, and 
probably other factors as well. 

What then controls the strength of the individual synthetic 
promoters presented here? It does not appear that any addi- 
tional conserved sequence motifs have been generated among 
the strongest promoters. Rather, it seems that the overall 
three-dimensional structure which arises from a particular nu- 
cleotide sequence could be important. 

The method presented here for tuning gene expression in 



the living cell has both advantages and disadvantages com- 
pared to the methods that would use an inducible expression 
system such as the lac promoter. A disadvantage is that instead 
of only one genetic construct, perhaps three to four constructs 
have to be made. On the other hand, the constructs are made 
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FIG. 6. Correlation between promoter activities in L. lactis and E. coli. The 
promoter activities measured in E. coli (from Fig. 5) were plotted as a function 
of the promoter activities measured in L. lactis (from Fig. 3). The symbols 
indicate errors in either the -35 or - 10 sequence (solid circles), a 16-bp spacer 
(triangles), or promoters with both of these errors (diamonds). The open square 
represents the vector clone. 
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in parallel, so that the amount of work should not be propor- 
tional to the number of constructs. The inducible systems have 
the advantage that gene expression can be turned on at the 
proper time during a fermentation, which is sometimes essen- 
tial (for instance, when the product is toxic to the host cell). 
The work presented here was aimed at generating a library of 
constitutive promoters, for achieving a constant level of gene 
expression throughout the growth of a culture. We are cur- 
rently working on synthetic inducible promoters in which a 
regulatory motif has been added. This should allow us to gen- 
erate libraries of promoters, which differ in basal expression 
level and can be induced to various extents, by changing a 
fermentation parameter (i.e., temperature, pH, or salt concen- 
tration) or by adding a specific inducer. 

The system presented here also has advantages. One is that 
it is easier to attain a steady expression level of the enzyme in 
question, which is often quite difficult with inducible systems 
such as the lac system (8). With the method presented here, 
once the optimal expression level of the enzyme has been 
determined, the engineered strain is ready to use directly in the 
fermentation process. 

An important feature of the system described here, in a 
longer perspective, is the possibility to simultaneously modu- 
late, to different extents, the expression of several individual 
genes or operons located at various positions of the genome in 
the same strain. Metabolic control analysis (5, 10) showed that 
in theory, flux and concentration control can be shared among 
several enzymes in a pathway, and experimental determina- 
tions of flux control have often showed that control seems to be 
distributed over many enzymes in the living cell (9, 15, 18, 19, 
22, 23): in most cases, there may not be such a thing as a 
rate-limiting step, and even if one finds a step that has a 
measurable control, the control will often disappear relatively 
quickly as the enzyme is being overexpressed. Since the sum of 
flux control must equal unity, this then means that flux control 
has been shifted to other steps in the pathway. In summary, in 
order to increase a given flux in a living cell, it may thus be 
necessary to (i) optimize the individual expression of several 
genes and (ii) after one round of optimization in which one 
enzyme was clamped at the optimal level, continue the opti- 
mization of other enzymes in the pathway. With the systems 
available until now, one would then quickly run out of expres- 
sion systems to use, but with our method, one can in principle 
continue the optimization numerous times. 

In this report, the method for generating synthetic promot- 
ers of different strengths was illustrated for use in the gram- 
positive bacterium L. lactis. However, there is no obvious rea- 
son why the approach should be limited to this organism, and 
the fact that the same promoter library was also functional in 
the gram-negative bacterium E. coli suggests that the approach 
may be universally applicable to prokaryotic organisms. An 
exciting question is then, can the approach be extended to 
work for modulating gene expression in eukaryotic cells? Such 
experiments are under way, and the results are quite encour- 
aging. 
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Isolation and characterization of mutants of firefly luciferase which 
produce different colors of light 



Naoki Kajiyama and Eiichi Nakano 
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Noda-city» Chiba 278, Japan \ 
The luciferase cDNA from the *Genji' fir^, Lucwla cruciata, 
was mutated with hydroxylamine to isolafe mutant luciferases. 
Some of the isolated mutant enzymes produced different 
colors of light, ranging from green to red. Five such mutants, 
producing green (X^ = 558 nm), yellow-orange (X^^x = 
595 nm), orange (X^ = 607 nm) and red light 0^^^ = 609 
and 612 nm), were analyzed. The mutations were found to 
be single amino add changes, from Val239 to He, Pro452 to 
Ser, Ser286 to Asn, Gly326 to Ser and His433 to Tyr 
respectively. 

Key words: color mutant/firefly iuciferase/random mutagenesis/ 
wavelength of maximum intensity 



Introduction 

Firefly luciferase catalyzes the production of light from luciferin 
in the presence of ATP, Mg^"^ and molecular oxygen (Deluca 
and McElroy, 1978). This enzyme efficiently converts chemical 
energy into light with a quantum yield of 0.88 (Seliger and 
McElroy. 1960). Due to its high sensitivity and extreme speci- 
ficity for ATP, luciferase has been used for assay of ATP in 
various biological samples (Ludin, 1981). 

The luciferase cDNA from the Japanese firefly, Luciola 
cruciata (*Genji-botaru' in Japanese), has been cloned and 
analyzed in our laboratory (Masuda et aL, 1989). The primary 
structure of this luciferase deduced from the. nucleotide sequence 
was shown to consist of 548 amino acids, with a total molecular 
weight of 60 024. This luciferase catalyzes a reaction that 
produces yellow-green light (X^ax == 562 nm), which is the 
same as that emitted by the North American firefly (Seliger and 
McElroy, 1964). It has been shown that the colors of light emitted 
by fireflies vary among species from green to yellow (K^ - 
543 -582 nm) (Seliger and McElroy, 1964). Since the substrate 
(D-luciferin) is the same for all species, the differences in the 
color of the light must be due to variations in the structure of 
the enzymes (McElroy and Seliger, 1966). Recently. cDNAs of 
luciferase from the bioluminescent click beetle, Pyrophorus 
plagiophthalamus, were cloned and their nucleotide sequences 
determined (Wood et al , 1989a). These cDNA clones code for 
luciferases of four different types, distinguished by the colors 
of their bioluminescence. The amino acid sequences of these 
luciferases are 95-99% identical, and less than two or three 
amino acid changes are needed for the spectral shift in the color 
(Wood etal, I989b.c). 

In the course of mutagenesis studies of luciferase cDNA from 
Luciola cruciata, we found that some mutants emitted different 
colors of light. Sequence analysis of these mutants revealed that 
the mutations were single amino acid changes. 



Materials and methods 

Plasmid, Escherichia coli strain and media 
Plasmid pGLf37 was constructed from pCSJ-fl by Mr H.Tatsumi 
in our laboratory (Masuda et aL, 1989). Escherichia coli strain 
JM 101 (Si^E, thi, A{lac-pro), [F*traim,%cIZAMl5,proAB]) 
(Yanish-Perron et aL, 1985) was used for the expression of 
luciferase cDNA. The E.coli cells were grown in LB broth (1 % 
Difco tryptone/0.5% yeast extract/0,5% sodium chloride), and 
50 fig/nd ampicillin was added when necessary. 
Mutagenesis and screening of 'color* mutants 
Plasmid pGLf37 containing Genji-firefly luciferase cDNA was 
treated, according to the methods of Kironde etal. (1989), with 
0.8 M hydroxylamine/O.l M sodium phosphate/1 mM ethylene- 
diaminetetraacetic add (EDTA), pH 6.0, for 2 h at 65^*0 (Figure 
1). The mutagen-treated plasmid was precipitated with ethanol 
and redissolved in 10 mM Tris^HCl/1 mM EDTA, pH 8.0, 
followed by transformation into KcoU JM 101. After 12 h at 
37°C. colonies on LB/ampicillin plates were transferred to 
nitrocellulose filters. The filters were soaked with 0.5 mM 
luciferin in 100 mM sodium citrate buffer. pH 5.0 (Wood and 
DeLuca, 1987), and die colors of bioluminescence emitted by 
the colonies were monitored. 
Purification of luciferase 

Escheridiia coli JM 101 cells harboring the mutant plasmid were 
cultured in 3 ml of LB broth containing ampicillin at 37°C for 
12 h. The cultures, 2 ml eac^i, were inoculated into 100 ml of 
LB broth containing ampicillin. After growth at 37°C for 6 h, 
the cultures were harvested. Escherichia coli pellets were 
resuspended in lysis buffer (100 mM potassium phosphate. pH 
7.8/2 mM EDTA/1 mg lysozyme per ml), incubated on ice for 
15 min and then frozen on dry ice. The frozen pellets were 
allowed to thaw at 25'*C and cleared by centriftigation. 

The lysates of E.coli were fractionated with ammonium sulfate; 
the fraction precipitated between 0.3 and 0.6 saniration was 
saved. The precipitate was dissolved with 25 mM Tris-HCl 
buffer, pH 7.8/1 mM EDTA/10%-samrated ammonium sulfate, 
and then loaded on an Ultrogel AcA34 gel filtration column 
(LKB). The active fraction was applied to a hydroxy apatite 
column (Tosoh. Tokyo, Japan), followed by elution with a 
10-100 mM sodium phosphate gradient. 
DNA sequencing 

Various restriction fragments derived from mutant luciferase 
cDNA were subcloned into pUCl 18 or pUCl 19, and sequenced 
using a DNA sequencer model 373 A (Applied Biosy stems). 

Results and discussion 
Screening of 'color* mutants 

As shown in Figure 1, pGLf37, which is a plasmid directing the 
synthesis of active luciferase in E.coli under control of the trp 
promoter, was treated with hydroxylamine solution for 2 h at 
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Table I. Wavelength of maximum intensity of light from wild-type and 
mutant luciferases 



Luciferasc 



Hydroxyiamine treatment 
Transformation of E.cofi 
Soak with luciferin solution 

Isolation of "color" mutants 

Fig 1. Strategy of mutation. Plasmid pGLf37 containing Genji firefly 
ludferase cDNA was treated according to Materials and methods. Clear and 
solid portions show the luciferasc cDNA and tip promoter respcctivcly. 
Abbrcviatiofis: E, EcoKV; P. Psil: S, Sspl; Ap, ampicillin resistance gene. 



Genji 

C-M-l 

C-M-2 

C-M-3 

C-M^ 

C-M-6 

C-M-1 1 



Color* 


\n2x (nm) 






pH 7.8 


pH 6.0 


yellow-green 


562 


609 


orange 


607 


614 


red 


609 


611 


red 


612 


612 


yellow-orange 


595 


609 


green 


558 


558 


yellow 


565 


612 




The spectra were measured with a IMUC-7000 intensified multichannel 
photodetector at pH 7.8 and pH 6.0. Except for pH, the condition for 
luminescence was the same as described in Figure 2. 
"Color was confirmed at pH 7.8. 



Table n. DNA sequence and amino acid sequence change in mutants 

Amino acid change 



Ser286 - Asn 
Gly326 - Ser 
His433 - Tyr 
Pro452 - Ser 
Vat239 - ne 



Mutant 


Color 


Base change 


C-M-1 


orange 


0857 - A 


C-M-2 


red 


G976 - A 


C-M-3 


red 


C1297 - T 


C-M^ 


yellow-orange 


C1354 - T 


C-M-6 


green 


G715 - A 



Fig 2. The color of bioluminesccnce emitted by the wild-type and mutant 
luciferases. Uft to right: yellow-green (wEd-type luciferasc); yellow-orange; 
orange; red; green; yeUow. These luciferases were punfied according to 
Materials and methods. Ten microliters of these luciferases were added to 
400 ul of substrate mU (25 mM glycylglydne, pH 7.8/5.4 mM 
MgSO4/0.086 mM hiciferin/2 mM ATP) to confirm the bioluminescence. 

65**C. The plasmid was transformed into Kcoli JM 101. To 
monitor the colors of bioluminescence, the transformants were 
soaked with luciferin solution. Subsequendy, we isolated several 
mutants, producing colors of light varying from green to red. 
Bioluminescence spectra of mutant luciferases 
Mutant luciferases were purified to homogeneity as described 
in Materials and methods. Figure 2 shows the colors of light 
emitted by the purified enzymes. When their spectra were 
measured with a IMUC-7000 intensified multichannel photo- 
detector (Otsuka Denshi, Osaka, Japan), the wavelengths of 
maximum intensity were 558 nm for green, 565 nm for yeUow, 
595 nm for yellow-c^ge, 607 nm for orange and 609 and 612 
nm for red (Table i, . « . 

It is known that luciferasc from the American firefly produces 
light with a peak iiitensity at -560 nm (yeUow-green) under 
optimal conditions (Seliger and McElroy, 1964). However, this 
peak can be affected by ten^)eramre, pH and metal ions. At low 
pH or in the presence of heavy metals, the emission peak is shifted 
toward the red, showing an emission of ~ 615 nm. but with 
a marked decrease in the quantum yield of tfie reaction (Seliger 
and McElroy, 19607 1964). This pSienomenon is also observed 
for Genji firefly luciferasc. In the mutants C-M-1, 2, 3, 4 and 
11, by contrast, the spectral peaks were shifted toward longer 
wavelengths under optimal conditions, with no detectable 
decrease of light intensity. Moreover, for mutants C-M-3 (red) 
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and C-M-6 (green) there was no pH effect on emission spectra 
(Table D. For Ae other red (C-M-2), orange (C-M-1) and yellow- 
orange (C-M-4) mutants, the shift of spectral peak at pH 6.0 was 
detected slightiy, but was not so large as for Genji luciferasc. 
On the other hand, for the yeUow mutant C-M-1 1 , the shift was 
largest in the mutant luciferases and its reaction decay was similar 
to that for Genji luciferasc. In contrast, the reaction decay in the 
C-M-3 and C-M-6 mutants at pH 6.0 was not as great as that 
of Genji wild-type luciferasc (data not shown). It is uncertain 
whether die color shifts and decay rates of the light output at 
low pH are related to each other. 
Amino acid sequences of mutants 

Determination of the nucleotide sequences in some mutant 
luciferases (green, yellow-orange, orange and red) was earned 
out to identify the base changes present. As shown in Table 11, 
die change in the yellow-orange rriutant was found to be from 
CCA to TCA, resulting in a change of Pro to Ser at position 
. 452. In the orange mutant, the alteration was from Ser to Asn 
at position 286. In the two red mutants, the changes for C-M-2 
and C-M-3 were identified as Gly326 to Ser and His433 to Tyr 
respectively. The green mutant contained a change of Val to He 
at position 239. These results indicate that only a single amino 
acid substitution in a luciferasc molecule is enough to produce 
the change in bioliAiinescence color. 

Recentiy the nucfeotide sequences of click beetie luciferasc 
cDNAs were determined by Wood et ai (1989a). These are of 
four different typ^, distinguishable by the colors of light 
produced by the luciferases they code: green. yeUow-green, 
yeUow and orange. Fragments of the four different types were 
recombined to construct hybrid luciferases, and two groups of 
amino acids, each capable of producing a change in the spectrum 
of luciferasc greater dian 16 nm. were detected (Wood aL . 
1989b). For the-first set, which contains the changes Arg223 to 
Glu and Leu238 to Val. the spectruim shifts from 560 to 577 nm. 
The spectrum for die other set, containing the changes SerUl 
to Gly. Asp352 to Val and Ser358 to Thr, shifts from 560 to 
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580 nm. It is not known whether aU the amino acids in each set 
are required for the spectral change. 

In our experiments the colors of light were shown to be changed 
effectively by only one amino acid substitution (Table 11). Four 
mutants showed an upward spectral shift of >30 nm, and the 
wavelength of their maximum light intensity was far longer than 
the orange of click beetle luciferase (593 nm). A red mutant (C- 
M-3), containing a change of His433 to Tyr, showed an especiaUy 
large upward shift of 50 nm to 612 nm. 

When the sequences of the mutant luciferases were compared 
with those of click beede enzymes, no common amino acid 
sequence affecting the color of light was detected. Further, in 
the color mutants of firefly luciferafe, there was no reversal of 
hydrophobicity or a large change in the conformational 
parameters of the secondary structure. For example, in the green 
mutant, the amino acid alteration was from Val to De. Both of 
these are hydrophobic, and conformational parameters for the 
a-helix and |3-sheel together with their charges are also similar 
(Chou and Fasman, 1978). When we estimated the secondary 
structures of wild-type and mutant luciferases from each of the 
primary structures using the algorithm of Gamier et al (1978), 
no drastic structural changes were detected. Thus, it may be con- 
cluded that the different colors of bioluminescence were caused 
by only subtle differences in the tertiary structure of the luciferase 
molecule. It would be of interest to discover whether the color 
change, which is a unique parameter for luciferase, is related 
to the other enzymatic properties. Further studies, including 
examination of the properties of these color mutant enzymes and 
analysis of their tertiary structures, would elucidate the relation- 
ship between die strucmre and function of luciferase. 

Firefly luciferase has been used for assay of ATP in vanous 
biological samples (Ludin, 1981). The mutant luciferases 
described above could be used more effectively for determming 
the amount of ATP in colored samples. In the case of red-colored 
samples, determination was found to be twice as sensitive usmg 
the mutant enzyme producing red light than with wUd-type 
luciferase (data not shown). 

Hydroxylamine treatment used in this study was a very simple 
and efficient metiiod for introducing random base substitutions 
into luciferase cDNA. However, since this chemical mutagen 
causes only GC to AT transition mutations, mutants with limited 
substitution of amino acids can be obtained. To overcome this 
problem, Myers et al (1985) used the method of treating smgle- 
stranded DNA witii nitrous acid, formic acid and hydrazine, 
followed by the synthesis of the con^lementary strand with 
reverse transcriptase. Using this method for luciferase, vanous 
color mutants which cannot be found at present may be obtained. 

In the present study, we succeeded for the first time in isolating 
several *color' mutant luciferases. Sequence analysis revealed 
that only a single amino acid change in the 548 amino acids of 
luciferase is enough for the color variation. Analysis of other 
mutant luciferases is now in progress. 
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ABSTRACT i?£/tiZ2areiH^onnj5 is an anthozoancoelenterate 
capable of exhibiting bioluminescence. Bioiiuninescence in Re- 
nilla results fh>m the oxidation of coelenterate ludferin (coelen- 
terazine) by ludferase [J?efuUa-luciferin:oxygen 2-oxidoreduc- 
tase (decarboxylathig), EC 1.13.12.5]. In vivo^ the exdted state 
ludferiD-luciferase complex undergoes the process of nonradi- 
atlve energy transfer to an accessory protein, green fluorescent 
protehi, which results in green biohiminescence. In vitro, ReniUa 
ludferase emits blue light in the absence of any green fluorescent 
protein. A Renilla cDNA library has been constructed in Agtll 
and screened by plaque hybridization with two oligonudeotlde 
probes. We report here the isolation and characterization of a 
ludferase cDNA and Its gene product. The recombbiant lu- 
ciferase expressed in Escherichia coU Is Identical to native 
ludferase as detmnined by SDS/PAGE, immunoblot analysis, 
and bioluminescence emission characteristics. 



Renilla reniformis (class Anthozoa) is a bioluminescent soft 
coral found in shallow coastal waters of North America, which 
displays blue-green bioluminescence upon mechanical stimu- 
lation (1,2). The components involved in ReniUa biolumines- 
cence have been described in detail (3). Renilla luciferase 
[/?eni7/a-luciferin:oxygen 2-oxidoreductase (decarboxylating), 
ECl.13.12.5] catalyzes the oxidative decarboxylation of coel- 
enterazine in the presence of dissolved oxygen to yield oxy- 
luciferin, CO2. and blue light (Amw = 480 nm) (4). This reaction 
has a bioluminescence quantum yield of ^1%, The stoichi- 
ometry of this reaction and the detailed mechanism leading to 
excited-state formation have been described (4, 5). 

The color of in v/fro-catalyzed bioluminescence changes 
from blue to green upon addition of submicromolar amounts 
of an energy-transfer acceptor green fluorescent protein 
(GFP), which has been purified from ReniUa and character- 
ized (6). This green fluorescence (A^ax = 509 nm) is identical 
to the in vivo emission in ReniUa. The energy-transfer pro- 
cess is nonradiative; an increase in both the quantum yield (6) 
and calculated lifetimes has been determined for this process 
(7). Luciferase and GFP form a specific 1:1 rapid equilibrium 
complex in solution (8). 

The elucidation of mechanisms involved in nonradiative 
energy transfer processes as well as determination of detailed 
structural information on both luciferase and GFP have been 
hindered by a lack of material. To overcome this, we have 
cloned, sequenced, and expressed in Escherichia coU a 
cDNA encoding ReniUa luciferase.5 

MATERIALS AND METHODS 

Amh|o Acid Sequence Determination of ReniUa Ludferase. 
Native Renilla luciferase was isolated as described (4). Pu- 
rified luciferase was digested with Staphlococcal protease 
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V-8 (9). The resulting peptides were purified by HPLC and 
subjected to NH2-terminal Edman sequencing as described 
(10). Based on these peptide sequences two 17-base oligo- 
nucleotide probes were synthesized with an Applied Biosys- 
tems DNA synthesizer at the Molecular Genetics Instrumen- 
tation Facility at the University of Georgia. 

Construction of a cDNA Library In Agtll. Live R. reni- 
formis were collected at the University of Georgia Marine 
Institute located at Sapelo Island. The animals were frozen 
immediately in liquid N2 ^"^1 stored at -80°C. Frozen tissue 
was ground to a fine powder in liquid N2 with mortar and 
pestle. Total RNA was isolated from the frozen powder by 
the guanidine thiocyanate method (11), and poly(A)'^ RNA 
was isolated by oligo(dT)-cellulose chromatography (12). 
cDNA was synthesized by the method of Gubler and Hoff- 
man (13). Phosphorylated EcoRI linkers (Collaborative Re- 
search) were ligated to the cDN As, which were then digested 
with EcoRl. Separation of cDNA from free linkers after 
EcoRl digestion as well as size selection of cDNAs were 
accomplished by electrophoresis in low-melting-temperature 
agarose (NuSieve, FMC) (14). cDNAs were ligated into the 
EcoRl site of Agtll (15). The library was amplified in Y1088 
cells (16) by a plate method (17). 

Isolation and DNA Sequence Determination of a Luciferase 
cDNA. Oligonucleotide probes were 5' end-labeled with T4 
polynucleotide kinase (Bethesda Research Laboratories) and 
[r'2p]ATP (3000 Ci/mmol; 1 Ci = 37 GBq; ICN) to specific 
activities>l x 10*cpm/ftg(18). Atotalof6 x 10^ recombinant 
plaque-forming units were screened by plaque hybridization 
(19). Phage DNA was isolated as described (20). A luciferase 
cDNA, isolated from the clone ARLuc-6, was subcloned into 
the M13 sequencing vectors mpl8 and mpl9, and sequencing 
templates were prepared (21). The DNA sequence of both 
strands was determined by the dideoxynucleotide chain- 
termination technique by using a Sequenase kit (United States 
Biochemical) and [a-"S]dATP (400 Ci/nmiol; Amersham) 
(22). The M13 universal primer and a Agtll sequencing primer 
(Amersham) were used to prime the sequencing reactions. 

Expression of Recombbiant Ludferase (r-ludferase). Posi- 
tive clones were converted to lysogens in E. coli Y1089 cells 
(16). Lysogens were grown at permissive temperatures and 
induced with 1 mM isopropyl /3-D-thiogalactopyranoside 
(IPTG). Crude cell extracts were prepared and assayed for 
luciferase activity as described below. The plasmid pTZR- 
Luc-1 was constructed by ligation of a 2.2-kilobase-pair (kbp) 
EcoRl/Sst I ARLuc-6 restriction fragment into the plasmid 
pTZlSR (Pharmacia), which contains the lacZ' gene. E, coli 
TG-1 cells (23) were transformed with pTZRLuc-1 (24). 
Single colonies were isolated and grown at 37**C in LB 



Abbreviations: GFP. green fluorescent protein; IPTG. isopropyl 
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medium containing ampicillin (100 /tg/ml) to an ODeoo = 
0.6-0.8 unit and induced with 1 mM IPTG for 4 hr. The cells 
were centrifuged at 10,000 x g and frozen solid at -20X. The 
pellets were thawed and resuspended 1:5 in 10 mM EDTA, 
pH 8» and lysozyme at 4 mg/ml (Sigma). After 20-min 
incubation at 25°C. the cells were placed on ice for 1 hr and 
then sonicated for 30 sec with a Branson cell disrupter. The 
cell lysate was clarified by centrifugation at 30,000 x g. The 
clarified lysate was used in subsequent bioluminescence 
assays and emission studies. 

Assay for ReniUa Luciferase Activity and Determination of 
Emission Spectra. Bioluminescence assays (4) were done with 
a Turner model TD-20e luminometer, and peak light inten- 
sities were determined. Bioluminescence intensity was con- 
verted to quanta per second by calibrating the instrument 
relative to a radioactive ^^Ni light standard that emits in the 
460- to 480-nm region (25). Corrected emission spectra were 
collected on an on-line computerized fluorimeter (26). A 
100-^1 sample of a clarified pTZRLuc-1 cell extract was 
added to 1 ml of luciferase assay buffer (4) or to 1 ml of 
**energy-transfer buffer'' containing 1 x 10"* M GFP (8). An 
excess of coelenterazine (0.47 mM) dissolved in MeOH was 
added to maintain a strong emission signal. 

Genomic Southern Blot Analyses. A 790-bp EcoRl/BamHl 
cDNA restriction fragment was labeled to specific activities 
2^1 X 10' cpm/Mg with both [a-"P]dATP and dCTP (4000 
Ci/mmol, ICN) by the random hexamer-priming method (27). 
Genomic DNA was isolated from ReniUa by a guanidine 
thiocyanate method developed for coelenterate DNA isola- 
tion (D. Prasher, personal communication). DNA samples 
were digested with the appropriate enzymes and resolved in 
a 0.8% agarose gel, followed by transfer to nitrocellulose 
filters (Schleicher & Schuell) (28). Aqueous hybridizations 
and washes were done at high stringencies as described for a 
homologous probe (17). 

Electrophoretic Analysis of Protein. Protein samples were 
analyzed on 12.5% SDS/PAGE gels that were fixed and 
stained with Coomassie blue as described (29). Immunoblots 
were done as described (30). Proteins were transferred to 
nitrocellulose (Schleicher & Schuell) and incubated in a 1:50 
dilution of rabbit anti-native luciferase antibody. Detection of 
the secondary antibody (horseradish peroxidase*coi\jugated 
goat anti-rabbit IgG) signal was determined according to the 
vendor's instructions (Bio-Rad). 

Computer-Facilitated DNA and Amino Add Sequence Anal- 
yses. The DNA sequence was compiled and manipulated 
using MicroGenie sequence software (Beckman). 

RESULTS 

Synthesis of Luciferase Oligonucleotides. Seven luciferase 
peptides (V8-1-V8-7) were purified by HPLC, and their 
amino acid sequences were determined. Two of the peptides 
contained regions of relatively low codon degeneracy. Amino 
acid sequence from these regions was used to synthesize two 
mixed-sequence 17-base oligonucleotide hybridization 
probes with the following sequences; RLP-1 {GAR- 
AAYAAY-TTY TTY-GT} and RLP-2 {AAR-AAR-TTY- 
CCN-AAY AT}, which are 32- and 64-fold degenerate, re- 
spectively. 



Nucleotide and Deduced Amino Acid Sequence Analyses of 
RenUIa Luciferase. Six clones were isolated from the ReniUa 
cDNA library. One clone, ARLuc-6, hybridized to both 
oligonucleotide probes. The cDNA insert could not be iso- 
lated after EcoKl digestion, as one of the linker sites was lost 
during cloning. A double digest with EcoRI and Sst I pro- 
duced a 2.2-kbp fragment that contained a 1.2-kbp cDNA 
with 1 kbp of Agtll DNA at the 3' end. This fragment was 
subcloned into the M13 vectors mpl8 and mpl9. DNA 
sequencing provided the locations of six base restriction sites 
(Fig. 1). which were used to generate specific sequencing 
subclones. The entire 1.2-kbp luciferase cDNA was se- 
quenced on both strands. 

The cDNA, excluding the EcoKl linkers, is 1196 nucleo- 
tides long and encodes an open reading frame (ORF) of 314 
amino acids (Fig. 2). Although an ATG in-frame codon is 
found at the 5' end of the cDNA, the intrinsic mRNA may 
contain additional 5' coding nucleotides. If the first ATG 
codon in the ORF is designated as the initiation codon, the 
predicted 311 amino acid sequence is essentially identical in 
size (34 kDa) and composition to native ReniUa luciferase (4). 

Comparison of the deduced amino acid sequence with the 
native peptides reveals that ARLuc-6 encodes a luciferase 
cDNA (Fig. 2). One discrepancy lies at amino acid residue 
222, which is leucine in the peptide sequence but tryptophan 
in the deduced sequence. Sequencing autoradiograms from 
this region of the clone have been examined carefully and 
found free of any irregularities. The protein sequence also 
contains a consensus N-linked glycosylation site (Asn-Xaa- 
Ser) beginning at residue 92. 

Genomic Southern Analysis. A ReniUa genomic Southern 
blot was probed with a 790-base-pair (bp) EcoKl/BamYW 
luciferase cDNA restriction fragment (Fig. 3). The Bamlil 
digest, lane A, contains two hybridizing bands as does the 
Sma I digest, lane B. The Bgl II digest, lane C, contains four 
bands. If luciferase is encoded by a single gene containing no 
introns, a single band would be expected in the Bamlil and 
Sma I digests, as these two sites are not spanned by the 
hybridization probe. Similarly, two bands would be expected 
in the Bgl II digest. That the BamWX and Sma I digests contain 
two hybridizing bands shows either that there is more than 
one luciferase gene or that the luciferase gene(s) has introns 
containing BamHI and Sma I sites. The four bands seen in the 
Bgl II could be explained by two very large introns containing 
Bgl II sites. When genomic DNA was digested with restric- 
tion enzymes having no sites within the cDNA sequence, 
there were always at least two or more hybridizing bands 
(data not shown). These results suggest that luciferase is 
encoded for by more than one gene, which may or may not 
contain introns. 

Luciferase Expression in £. coU, The ARLuc-6 lysogen is 
capable of low-level r-luciferase expression as determined by 
light emission from clarified, crude extracts (5 x 10^** 
hvsec~^-ml"^). When these cells are induced with 1 mM 
IPTG, light emission decreases by 2-fold; this happens be- 
cause the cDNA is reversely oriented with respect to the 
Agtll lacZ promoter. Presumably, when IPTG is absent, the 
luciferase gene is transcribed from a promoter in the right end 
of Agtll, as reported (31). 

The 2.2-kbp EcoKl/ Sst I fragment was subcloned into the 
plasmid pTZ18R, which uses the lacZ promoter. The ORF of 
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Fig. 1. Location of six base restriction enzyme sites within the luciferase cDNA. The boxed region defines the ORF. The EcoRl site at the 
5' end is a synthetic linker site. 
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1 AGC TTA AAG U^i ACT TCG AAA GTT TAT GAT CCA GAA CAA AGG AAA CGG ATG ATA ACT GGT 
1 Ser Leu Lya Mot Thr Ser Lya Val Tyr Aap Pro Glu Gin Arg Lys Arg Met lie Thr Gly 



60 
20 



61 CCG GAG TGG TGG GCC AGA TGT AAA CAA ATG AAT GTT CTT GAT TCA TTT ATT AAT TAT TAT 120 

21 Pro Gin Trp Trp Ala Arg Cys Lys Gin Met Asn Val Leu Aap Ser Phe He Asn Tyr Tyr 40 

* * * • * * 

121 GAT TCA GAA AAA CAT GCA GAA AAT GCT GTT ATT TTT TTA CAT GGT AAC GCG GCC TCT TCT 180 

41 Asp Ser Glu Lya His Ala Glu Asn Ala Val He Phe Leu His Gly Asn Ala Ala Ser Ser 60 

* * * * * * 

181 TAT TTA TGG CGA CAT GTT GTG CCA CAT ATT GAG CCA GTA GCG CGG TGT ATT ATA CCA GAT 240 

61 Tyr Lou Trp Arg His Val Val Pro Hia He Glu Pro Val Ala Arg Cya Ho He Pro Aap 80 

* * * * * * 

241 CTT ATT GGT ATG GGC AAA TCA GGC AAA TCT GGT AAT GGT TCT TAT AGG TTA CTT GAT CAT 300 

81 Lou He Gly Met Gly Lya Ser Gly Lya Ser Gly Aan Gly Sor Tyr Arg Lou Lou Aap Hia 100 

* * t * * » 

301 TAC AAA TAT CTT ACT GCA TGG TTT GAA CTT CTT AAT TTA CCA AAG AAG ATC ATT TTT GTC 360 

101 Tyr Lys Tyr Leu Thr Ala Trp Phe Glu Leu Leu Asn Leu Pro Lya Lya He He Phe Val 120 

* * * * * . 

361 GGC CAT GAT TGG GGT GCT TGT TTG GCA TTT CAT TAT AGC TAT GAG CAT CAA GAT AAG ATC 420 

121 Gly His Aap Trp Gly Ala Cya Lou Ala Phe His Tyr Ser Tyr Glu Hia Gin Aap Lys He 140 

* * * * * * 

421 AAA GCA ATA GTT CAC GCT GAA AGT GTA GTA GAT GTG ATT GAA TCA TGG GAT GAA TGG CCT 480 

141 Lya Ala Ho Val His Ala Glu Sor Val Val Aap Val Ho Glu Ser Trp Aap Glu Trp Pro 160 

* * * * * * 

481 GAT ATT GAA GAA GAT ATT GCG TTG ATC AAA TCT GAA GAA GGA GAA AAA ATG GTT TTG g&g 540 

161 Aap He Glu piTTsp lie Ala Leu lie Lys Kor mu Ulu Sly Ulu Lya wet vai Leu tiiu | 180 

* * * « * * 

541 AAT AAC TTC TTC GTG GAA ACC ATG TTG CCA TCA AAA ATC ATG AGA AAG TTA GAA CCA GAA 600 

181 [ Aan Aan Phe Phe Val g lul Thr Met Lou Pro Sor Lya He Met Arg Lya Leu Glu Pro Glu 200 

* * * * * * 

601 GAA TTT GCA GCA TAT CTT GAA CCA TTC AAA GAG AAA GGT GAA GTT CGT CGT CCA ACA TTA 660 
201 Glu Phe Ala Ala Tyr Lou Glu Pro Pho Lya |Glu Lya Gly Glu vai Arg Arg Pro Thr Leu"| 220 

+ » * * * * * 

661 TCA TGG CCT CGT GAA ATC CCG TTA GTA AAA GGT GGT AAA CCT GAC GTT GTA CAA ATT GTT 720 

221 | Bor T rp Pro Arg mu lie Pro Uu Val Lya "ClyT Gly Lya Pro Asp Val Val Gin He Val 240 

* • * * * * 

721 AGG AAT TAT AAT GCT TAT CTA CGT GCA AGT GAT GAT TTA CCA AAA ATG TTT ATT GAA TCG 780 

241 Arg Aan Tyr Aan Ala Tyr Leu Arg Ala Sor Aap Aap Leu Pro Lya Mot Pho He [Glu Ser | 260 

* * • * * * 

781 GAT CCA GGA TTC TTT TCC AAT GCT ATT GTT GAA GGC GCC TTT C CT ftAT ACT GAA 840 

261 [ Asp Pro (;iy Pho Phe Bor Aan Ala lie Val tJl u Sly Ala Lya Lya Pho Pro Asn Thr Giu | 280 

* * * * * * 

841 TTT GTC AAA GTA AAA GGT CTT CAT TTT TCG CAA GAA GAT GCA CCT GAT GAA ATG GGA AAA 900 

281 Phe Val Lya Val Lya Gly Leu Hia Pho Ser Gin Glu Aap Ala Pro Aap Glu Met Gly Lya 300 



901 TAT ATC AAA TCG TTC GTT GAG CGA GTT CTC AAA AAT GAA CAA IfiaTTACTTT GGTTTTTTAT 
301 Tyr Ho Lys Ser Phe Val |Glu Arg Val Leu Lya Aan Glu | Gln 



960 
314 



963 TTACATTTTT CCCGGGTTTA ATAATATAAA TGTCATTTTC AACAATTTTA TTTTAACTGA ATATTTCACA 1032 

1033 GGGAACATTC ATATATGTTG ATTAATTTAG CTCGAACTTT ACTCTGTCAT ATCATTTTGG AATATTACCT 1102 

1103 CTTTCAATGA AACTTTATAA ACAGTGGTTC AATTAATTAA TATATATTAT AATTACATTT GTTATGTfiAl 

1173 AAAC TCGGTT TTATTATAAA AAAA 



1172 
1196 



Fig. 2. Nucleotide sequence and translated amino acid sequence of the Renilla luciferase cDNA. Putative and known translation control 
elements, as welt as oligonucleotide hybridization sites, are underlined. Positions of native luciferase peptide sequences are boxed and« except 
at one residue (+), are identical to the deduced amino acid sequence obtained from the luciferase cDNA. Some of the native luciferase peptide 
sequences oveHap at glutamic acid residues. 



the cDNA is not in frame with the lacZ' gene ORF of 
pTZlSR. Supematants were prepared from IPTG-induced 
pTZRLuc-1 cells, as described, and the level of luciferase 
expression was measured by the standard luciferase assay. A 
high level of r-luciferase activity, 2 x 10^^ hvsec"^'mrS is 
detected in clarified crude extracts of pTZRLuc-1 cells. This 
level of activity is 7-fold greater than in uninduced pTZR- 
Luc-1 cells. 

A prominent protein band (Mr = 34,000) migrating to the 
position of native luciferase is seen after SDS/PAGE of pTZR- 



Luc-1 crude extracts (Fig. 4). Crude extracts of IPTG-induced 
pTZRLuc-1 cells were analyzed by immunoblotting (Fig. 5). 
The protein band that reacts with antiluciferase antibody, lane 
6, corresponds to the same band seen in the Coomassie-stained 
gel (Fig. 4). Native luciferase was used as a positive control, 
lane A. No signal is detectable in the crude extract of pTZlSR 
cells, lane C. A duplicate filter incubated with preimmune 
serum showed no detectable signaJ. 

Bioluminescence Emission Spectra. The r-luciferase- 
catalyzed bioluminescence emission spectrum (Fig. 6a) is 
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Fig. 3. Genomic Southern blot of /?. 
reniformis DNA hybridized to a 790-bp 
EcoR\/BamHl luciferase cDNA frag- 
ment. Genomic DNA had been digested 
with the following restriction enzymes: 
BamHl (lane A); Sma I (lane B); and Bgl 
II (lane C). Samples were resolved in a 
0.8% agarose gel and transferred to ni- 
trocellulose. Each lane contains 20 /ig of 
digested genomic DNA. Molecular size 
markers are in kbp. 



very similar to that seen with native luciferase (32). The 
r-luciferase emission spectrum has a A^ax = 480 nm and a 
slight shoulder at 4Q0 nm, which correspond to emission from 
the excited-state oxyluciferin monoanion and neutral spe- 
cies, respectively. Disproportionation between these species 
is sensitive to environmental factors (7); thus, this spectrum 
indicates the strong similarity of the active-site environment 
between r-iuciferase and the native enzyme. Although an 
increase in quantum yield has yet to be determined, r-luci- 
ferase can clearly transfer energy in the presence of Renilla 
GFP (Fig. 6b). The emission spectrum dramatically shifted 
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Fic. 4. SDS/PAGE analysis of total protein from IPTG-induced 
E. coll cells transformed with either pTCRLuc-1 or pTZlSR. Ten- 
milliliter cultures were grown to an ODeoo - 0.8 and induced with 1 
mM IPTG for 4 hr. One-milliliter of cell culture. ODcoo = 5.0, was 
pelleted and resuspended in 0.5 ml of SDS sample buffer. Samples 
were boiled for 5 min, and 20 /il was loaded per lane: native luciferase 
(10 /ig) (lane A); pTZRLuc-1 cells (lane B); and pTZ18R cells (lane 
C). Molecular weight {Mr ^ 10~^) standard positions are indicated. 
Arrow shows position of native luciferase (L'ase). 




Fig. 5. Immunobtot analysis of total 
protein. Sample preparation and electro- 
phoresis were the same as in Fig. 4. 
Native luciferase (2 fxg) (lane A); 10 /il of 
pTZRLuc-1 cell extract (lane B); and 10 
mI of pTZlSR cell extract (lane C). Mo- 
lecular weight (Mr X 10"^) standard po- 
sitions are indicated. 



from the broad emission band generated by r-luciferase to the 
narrow, structured emission band (^max = 509) seen when 
GFP is present. The emission spectrum generated with 
r-luciferase and GFP is very similar to the same spectrum 
generated with native luciferase and GFP (33). 
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Fig. 6. Bioluminescence emission spectra generated with crude 
r-luctferase and r-luciferase plus GFP. Crude pTZRLuc-1 cell ex- 
tracts were prepared, as described, and 100 /il* *1 ^ 10"* M 
r-luciferase, as determined by peak light emission, was used to 
generate each spectrum, (a) Emission spectrum of crude r-luciferase. 
(b) Spectrum that results when 1 x 10"* M Renilla GFP is added to 
crude r-luciferase in energy-transfer buffer. 
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DISCUSSION 

This work describes the isolation of a 1.2-kbp /?. reniformis 
luciferase cDNA capable of directing the expression of 
r-luciferase. The cDNA contains an ORF encoding a 314- 
amino acid sequence in which all of the native luciferase 
peptide sequences obtained from V8-protease digestion are 
found. Rescreening the cDNA library with a 790-bp luciferase 
cDNA fragment as a hybridization probe has failed to pro- 
duce other clones that contain the 5' noncoding region; 
therefore, whether the luciferase cDNA is full-length is not 
known. This uncertainty can be resolved by sequencing 
genomic clones corresponding to the 5' end of the luciferase 
gene. 

The genomic Southern hybridization data indicates that 
Renilia luciferase is probably encoded by more than one 
gene, which may or may not contain intervening sequences. 
Further characterization of luciferase genomic clones will be 
required before the genetic organization of the luciferase 
gene(s) can be defined. 

A putative initiation codon located at triplet position 4 of 
the ORF may be the translation initiation site for Renilia 
luciferase; the 311-amino acid sequence is essentially iden- 
tical to native luciferase with respect to its composition and 
predicted molecular weight. Irrespective of whether this 
cDNA is full length, the luciferase that it encodes is ex- 
pressed in pTZRLuc-1 cells and is catalytically active. The 
expression data demonstrate that r-luciferase is the same size 
as native luciferase on SDS/PAGE gels and is reactive with 
polyclonal rabbit antibodies raised against native Renilia 
luciferase. Expression of r-luciferase from the plasmid pTZR- 
Luc-1 is "leaky'* because activity can be detected from 
uninduced cell cultures. The luciferase cDNA ORF is not in 
frame with the short lacZ' ORF contained in this construct. 
Any translation product initiating at the ^-galactosidase 
sequence of pTZRLuc-1 would be terminated at a stop codon 
immediately adjacent to the putative initiation codon in the 
luciferase cDNA. Thus, the r-luciferase seen in SDS/PAGE 
gels does not contain any ^galactosidase sequence. We 
propose that expression of r-luciferase by pTZRLuc-1 is due 
to a translation coupling mechanism (34). 

r-lueiferase displays two very important characteristics of 
native luciferase: the ability to catalyze coelenterazine oxi- 
dation with the concomitant emission of blue light (A^ax = 480 
nm) and the ability to transfer energy to Renilia OFF with the 
production of green light (Amax = 509 nm). The two emission 
bands at 400 nm and 480 nm in the r-luciferase spectrum 
verify the strong similarity between the native and recombi- 
nant proteins and suggest that the integrity of the luciferase 
active site has been maintained. Furthermore, that energy 
transfer occurs in the presence of OFF shows that the 
luciferase domain(s) required for the interaction between 
luciferase and GFP is present in r-luciferase. Once pure 
r-Iuciferase is available, energy transfer quantum yield mea- 
surements will offer a more quantitative determination of the 
efficiency of the nonradiative energy-transfer process. Fi- 
nally, the data demonstrate that N-linked glycosylation is not 
required for luciferase activity because E. coli do not perform 
this modification (35). r-luciferase in E. coli crude extracts 
behaves like the native enzyme by every criterion examined 
thus far. 
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The use of the Escherichia cofi enzyme ^-glucuronidase (GUS) as a reporter in gene expression stud- 
ies is limited due to loss of activity during tissue fixation by glutaraldehyde or formaldehyde. We have 
directed the evolution of a GUS variant that is significantly more resistant to both glutaraldehyde and 
formaldehyde than the wild-type enzyme. A variant with eight amino acid changes was isolated after 
three rounds of mutation, DNA shuffling, and screening. Surprisingly, although glutaraldehyde is known 
to modify and cross-link free amines, only one lysine residue was mutated. Instead, amino acid changes 
generally occurred near conserved lysines, implying that the surface chemistry of the enzyme was 
selected to either accept or avoid glutaraldehyde modifications that would normally have inhibited func- 
tion. We have shown that the GUS variant can be used to trace cell lineages in Xenopus embryos under 
standard fixation conditions, allowing double staining when used in conjunction with other reporters. 

Keywords: p-glucuronidase, reporter gene, in vitro evolution, directed evolution, DNA shuffling, Xenopus laevis 



Since plants express endogenous p-galactosidase activity, /acZ can- 
not be employed as a reporter gene'. Instead, the Escherichia coli p- 
glucuronidase gene {gusA, formerly uidA) has been developed as a 
reporter gene for plants, and has been widely used for over a 
decade^. Both chromogenic and fluorogenic GUS substrates have 
been synthesized^ allowing rapid nonradioactive assays. The GUS 
enzyme is stable and active under a variety of conditions', even 
when fused to other sequences^ 

The utility of GUS as a reporter, however, has been constrained 
in three ways. First, many animal systems, and some plants and 
plant-associated bacteria express endogenous glucuronidase activi- 
ties^'^. Second, GUS activity is greatly reduced during tissue fixation 
by glutaraldehyde or formaldehyde, making it necessary to trade off 
retention of activity for preservation of tissue structured Third, 
both of these considerations drastically restrict the use of GUS as a 
reporter gene in vertebrate systems^ 

Enzymatic inactivation by aldehydes is largely due to the for- 
mation of Schiff bases with surface-accessible lysine residues^. 
While the removal of lysine residues by directed mutation might 
render an enzyme more resistant to fixatives, many surface lysines 
are critical for function and cannot be readily changed. The 
sequences of the E. coli ^, human^ mouse'^, rat", and dog'^ 
homologs are known. Six of the 27 lysine residues in the E. coli 
protein are conserved in the other species and thus are likely 
essential. Moreover, to find what combination of the 27 lysine 
residues could be changed in order to increase resistance to fixa- 
tives without abrogating enzyme activity would require con- 
structing and assaying a dauntingly large number of mutant 
enzymes. Therefore, in order to alter the surface chemistry of 
GUS. either to avoid or to accommodate aldehyde modifications 
without loss of enzyme activity, we employed a random mutation- 
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al approach similar to those previously proven useful for altering 
enzyme substrate specificity'^ or thermostability'''. 

Results 

Directed evolution of glutaraldehyde-resistant variants. Random 
mutations were initially introduced into the gusA structural gene by 
mutagenic PCR'd Mutated PGR products were ligated into the 
expression vector ^u5/\-pBSA and transformed into E. coli. When 
the library was induced on plates containing the chromogenic GUS 
substrate, 5-bromo-4-chloro-3-indolyl-P-D-glucuronidc (X-gluc). 
approximately 80% of the colonies were visibly less green than con- 
trol colonies expressing only the chromosomal gene (see 
Experimental protocol). p-Glucuronidase functions as a tetramer'^ 
so it was likely that many of the mutations in the highly expressed, 
plasmid-borne library had a dominant negative effect on the func- 
tion of the chromosomal gene. This did not deter us from utilizing 
this library for screening experiments, since successive rounds of 
DNA shuffling should efficiently select against neutral or deleteri- 
ous mutations'^. 

Nine thousand replica-plated colonies, each expressing a ran- 
domly mutated gusA gene, were exposed to buffer containing 0.2% 
glutaraldehyde for 20 min. The colony remnants were then incubat- 
ed in buffer containing X-gluc and the histochemical indicator, 
nitroblue tetrazolium (NBT) (Fig. 1). The catalytic activity of the 
wild-type enzyme is greatly diminished under those conditions, 
indicating that the glutaraldehyde disrupts the cell membranes and 
covalently modifies many intracellular proteins, including GUS (cf. 
Figs 2Aand B). Of all the variants examined, only 10 colonies repro- 
ducibly exhibited greater catalytic activity than control colonies 
expressing wild-type ^U5v4 (Fig. 2B). The corresponding colonies on 
master plates were isolated, and their expression vectors purified 
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Figure 1. Screen for glutaraldehyde- resistant (^-glucuronidase (GUS) 
function (sequence fronn top left). A library of randomly mutated j^- 
glucuronidase genes igusA) is subcloned into an inducible 
expression vector and transformed into Escherichia coli. The 
resulting colonies are transferred to a nitrocellulose filter, which is 
overlaid upon an agarose plate containing an inducer and incubated 
for 12-24 h at ST^C. The filter-bound colonies are incubated in buffer 
containing glutaraldehyde, then transferred to buffer containing the 
histochemical indicators of (Vglucuronidase, X-gluc and NBT. The 
brief incubation in 4 M guanidine HCt (Gdn-HCI) arrests color 
development. Colonies that retain GUS activity are isolated from the 
original plate and randomly recombined by DNA shuffling for the next 
round of screening. 



and pooled. The variant gusA genes were amplified using the PCR 
and randomly recombined by DNA shuffling'^. We then screened 
6,000 random recombinants in a second round for variants that 
retained catalytic activity after a 20 min incubation in 1.0% glu- 
taraldehyde. Nine colonies contained variants that exhibited greater 
residual catalytic activity than the most resistant clone isolated in 
the first round of screening (Fig. 2C). These variants were again 
pooled, amplified, and randomly recombined. Then, 6,000 recom- 
binants were screened in a third round for variants that retained 
catalytic activity after a 20 min incubation in 3.5% glutaraldehyde. 
Again, nine improved clones were isolated, one of which (GUS^**) 
reproducibly showed the greatest activity under the most stringent 
conditions (Fig. 2D). 

In vitro characterization of GUS^*^. To determine whether the 
colony-lift assay significantly influenced the apparent fixative- 
resistant phenotype of GUS^, activity assays were also carried out 
in cell extracts. The GUS-deficient strain, pREP4/GMS407, was 
transformed with vectors that expressed either wild-type gusA, the 
Gt/S^'^ variant, or the /acZ a-fragment. Cell extracts from induced 
cultures were treated with glutaraldehyde or formaldehyde for 20 
min at 23*C, and diluted 100-fold in buffer containing saturating 
concentrations of a GUS substrate, p-nitrophenyl p-D-glu- 
curonide (PNPG). The extracts containing wild type or GUS^ 
catalyzed the hydrolysis of PNPG; no hydrolysis was detected in 
the negative control extracts, in which only the ^acZ a-fragment 
was expressed (data not shown) . Treatment of the extract contain- 
ing wild-type GUS with only 0.04% glutaraldehyde for 20 min at 
23''C reduced catalytic activity by 99.6 ± 0.24%, In sharp contrast, 
the GUS^ extract retained 78.1 ± 0.69% of its activity after treat- 
ment with a fivefold higher (0.2%) concentration of glutaralde- 
hyde (Fig. 3A). 



B 




[Glutaraldehyde] 



0% 



0.2% 



1 -0% 



rour 

winners winner 




3.5% 



round 2 rouncj 3 
winner winner 

Figure 2. Detection of glutaraldehyde-reslstant GUS activity. 
Escherichia coli ceils transformed with vectors expressing the wild- 
type (A, B left), a pool of the ten glutaraldehyde-reslstant variants 
from round 1 (B right, C left) or the most resistant variants from 
rounds 2 (C right, D left), or 3 (D right) were streaked onto 
noninducing plates. The colonies were propagated. Induced, and 
treated for 20 min with the indicated concentrations of 
glutaraldehyde, then reacted with X-gluc and NBT, as described In 
the legend to Figure 1. 



Extracts were also separately treated with formaldehyde to assess 
whether the fixative-resistant phenotype was specific to glutaralde- 
hyde. Again, the GUS^ variant exhibited much greater resistance to 
the fixative than did the wild-type GUS. The wild-type extract 
retained only 4.6 ± 0.05% of its catalytic activity after treatment 
with 0.08% formaldehyde; the GUS^** extract retained 62.4 ± 0.40% 
activity after incubation with 0.4% formaldehyde (Fig, 3B). To 
determine how sequence and chemical modifications may have 
influenced GUS activity, we conducted kinetic studies of the mutant 
enzyme (Fig. 3C). The wild-type and evolved genes were sub- 
cloned, expressed as fusion proteins with N-terminal hexahistidine 
tags, and purified by immobilized metal ion adsorption chromatog- 
raphy. Purified enzymes were assayed with varying concentrations 
of PNPG (Fig, 3C) . The kinetic parameters of the wild-type enzyme 
(Km for the complex with PNPG = 1 10 ± 2.9 ^M; /fcp, = 920 ± 7.3 s-») 
were very similar to those of the GUS^ variant (Km = 1 50 ± 3.9 [.iM; 
/feat = 750 ± 5.8 s*'). The kinetic parameters of wild-type and evolved 
enzymes were also determined following reaction with a sublethal 
concentration (0.04%) of formaldehyde for 20 min at 23''C. 
Formaldehyde had a larger effect on the turnover number (250 ± 17 
s*' for the partially modified wild type, 650 ± 4.7 s"* for the modified 
GUS^ variant), than on the Michaelis constants (99 ± 6.7 (.iM and 
170 ± 3.7 \iM for the modified wild-type and GUS'^'* enzymes. 
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Figure 3. GUS catalytic activity as a function of aldehyde concentration. Escherichia coli cell extracts containing the wild-type (circles) or evolved 
GUS'^ (squares) {^glucuronidase were incubated for 20 min at 23*'C In buffer containing the indicated concentrations of glutaraldehyde (A) or 
formaldehyde (B). The protein was then diluted 100-fold Into buffer containing the chromogenic GUS substrate, p-nitrophenyl-|^D-glucuronide 
(PNPG). Hydrolysis of the substrate was followed at 405 nm using a spectrophotometer (see Experimental Protocol). Each point represents the 
average of three initial velocity values; the points subsume the error bars. The control extract from an isogenic strain not expressing GUS does not 
have detectable activity (not shown). (C) Purified wild-type (circles) or evolved GUS^ (squares) enzymes were incubated for 20 min at 23° C in buffer 
containing 0% (empty symbols) or 0.04% (filled symbols) formaldehyde* then reacted with the indicated concentrations of PNPG. The Initial velocity 
values were fitted to the Michaelis-Menten equation (lines); the derived kinetic parameters are presented In the text. 



respectively). These results in conjunction with the ceil extract 
data show that GUS^*^ is expressed at lower levels than the wild 
type, but is inherently more aldehyde resistant. 

Sequence and structural mapping of the evolved GUS'^'* vari- 
ant. Upon sequencing, the evolved gusA gene was found to contain 
the following amino acid substitutions; N66D, D151N, A219V, 
I396T. T480A. Q498R. D508E and K567R, as well as six silent 
mutations. Only one of these changes, the D508E mutation, 
results from a transversion, indicating a strong transition bias in 
our random mutagenesis methods. The amino acid sequences of 
the E. coli and human GUS proteins are 50% identical'^ and could 
be readily aligned by the algorithm devised by Needleman and 
Wunsch'^ using the program GAP 4.0 (Genetics Computing 
Group, Madison, WJ) . Both proteins are tetramers'^ '^ and are vir- 
tually identical in substrate specificity'. The positions of the loci 
that were altered in the evolved E. co// enzyme could thus be tenta- 
tively mapped onto the crystal structure of the human GUS pro- 
tein>8 (Fig. 4). 

The 10 gusA mutants isolated in the first round of screening 
were sequenced; mutations were found at a frequency of three per 
1.8 kb. Seven of the first-round variants contained amino acid 
substitutions (K567R. T480A, D508E or N66D/D151N) that were 
subsequently found in the most active third-round GUS^ variant. 
It is instructive that many of the single substitutions that confer 
modest resistance to aldehyde modification can interact additively 
or synergistically to confer robust resistance to aldehyde modifica- 
tion. 

GUS^^ as a lineage tracer in Xenopus embryos. The N358S 
mutant of GUS is a commercially available and commonly used 
reporter gene in plants*. The N358S mutation eliminates a cryptic 
glycosylation site, and should not affect its function in the cyto- 
plasm; we chose this construct because it also contained the 
upstream sequences necessary for expression in eukaryotic cells^''. 
In order to determine if the fixative-resistant GUS might also 
prove useful in other model organisms, transcripts encoding 



GUS^ and N358S GUS were microinjected into 16- to 32-ccll 
stage Xenopus embryos. Two days later, the embryos were fixed in 
3.7% formaldehyde for 20 min and stained using a standard pro- 
tocol for the detection of /acZ expression, except that X-gluc was 
substituted for X-gal. The descendants of cells injected with the 
wild-type-like N358S GUS mRNA did not change color (Fig. 5A). 
Endogenous GUS activity was apparently also abrogated by the 20 
min incubation in 3.7% formaldehyde. In contrast, the descen- 
dents of cells injected with the GUS'^'* mRNA turned bright blue- 
green (Fig. 5B). 

In order to determine if multiple reporters might be used in 
tandem for lineage analysis, mRNAs encoding either N358S or the 
GUS^*^ were co-injected into embryonic cells along with mRNA 
encoding lacZ. When the embryos were first stained with X-gluc, 
again only the cells that inherited the GUS'^'* mRNA turned blue- 
green (Fig, 6A and B). The embryos were subsequently stained 
with rose-gal, a p-galactosidase substrate that forms a red precipi- 
tate. In embryos that received either the N358S or GUS^**, some 
cells were colored red, indicating the inheritance of lacZ. However, 
in embryos that received GUS^** some cells or patches were also 
purple, indicating the co-inheritance of the GUS^** and lacZ (Fig. 
6C and D). The lacZ mRNA in this experiment also served as an 
internal control that demonstrated that RNAs were entering cells 
and surviving until fixation. 

Discussion 

Mechanisms of aldehyde resistance. The E. coli GUS protein con- 
tains 27 lysine residues, six of which are conserved among the 
sequenced GUS genes. Although the fraction of lysine residues 
that are modified by aldehydes is unknown, wild-type GUS activi- 
ty is quite susceptible to even low levels of fixatives. For example, 
the catalytic activities of the wild-type (Fig. 3B) and N358S GUS 
(Figs 5A and 6A) are inactivated by <3.7% formaldehyde, while P- 
galactosidase, which contains 20 lysine residues, is not (Fig. 6). 
Taken together, these results suggest that one or more of the GUS 
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Figure 4. Homology mapping of amino acid substitutions that confer 
aldehyde resistance. The crystal structure of a subunit of the Cd 
trace of human GUS^^ is shown. The amino acid sequences of the E, 
coli and human GUS proteins were aligned using the application GAP 
4.0 (Genetics Computer Group) and were found to be 48.5% identical. 
The positions of lysine residues are darkened, and the conserved 
lysines are labeled. The positions of amino acid substitutions in the 
evolved GUS^*^ E. coli protein are shown as bails. 

lysine residues is either itself critical for activity or presents a con- 
jugation site that leads to functional disruption. However, identi- 
fying which of the many lysine residues in GUS were responsible 
for inhibition by fixatives would have been a daunting tasl^. 
Instead, we relied on a random mutagenesis to identify GUS variants 
with catalytic activity resistant to aldehydes. Following three rounds 
of screening and amplification, we isolated an octuple-mutant 
GUS^** with catalytic activity resistant to roughly 80-fold higher lev- 
els of glutaraldehyde than the wild-type activity (Fig. 3A). 

Surprisingly, only one of the amino acid substitutions, K567R, 
in the evolved GUS^^ occurred at a lysine residue. Since AAA or 
AAG encodes lysine, the apparent transition bias in our random- 
mutagenesis method and the size of our initial library provided 
ample opportunities for each lysine to conservatively mutate into 
arginine (AGA or AGG). While it is possible that mutation of this 
single lysine was largely responsible for protection against aldehy- 
des, this explanation is unlikely. Three of the 10 clones isolated in 
the first round of screening contain the K567R sequence substitu- 
tion, but none of these first-round isolates are as resistant as any 
of the second-round isolates (Fig. 2G). The aldehyde resistance of 
GUS progressively increased over three rounds of screening and 
selection, and the final product had accumulated seven additional 
amino acid substitutions. The finding that amino acid substitu- 
tions that modulate protein function are dispersed in the primary 
and tertiary structure of GUS is congruent with previous attempts 
to evolutionarily engineer the physical and kinetic parameters of 
enzymes. Experiments that directed an increase in the catalytic 
activity of a p-nitrobenzyl esterase in organic enzymes yielded 
multiple sequence substitutions scattered throughout the tertiary 
structure^'. Site-directed mutation studies of T4 lysozyme have 
shown that stabilizing amino acid changes, which occur in the 
core of that enzyme, are additive in effect^^. 



Interestingly, the seven non-lysinc amino acid substitutions 
mapped onto the surface of the protein near lysine residues (Fig. 
4). Protein structure is more highly conserved than protein 
sequence^^, and since the primary sequences of the £. coli and 
human GUS enzymes are quite similar (48.5% identity^), it can be 
conservatively assumed that their tertiary structures also align 
well. Based on this assumption, we can advance hypotheses 
regarding the contributions of individual amino acid substitu- 
tions to aldehyde resistance. For example, Lys568 (£". coli number- 
ing) is conserved among the sequenced GUS genes, and is in the 
active-site'^. The Ca-Ca distance from Lys568 to the D508E sub- 
stitution is 3.97 A (Fig. 4). Since the lysine side chain is 7 A in 
length, this adjacent sequence substitution might raise the p/C^ of 
the epsilon amino group of Lys568, thereby reducing its reactivity 
with aldehydes. Similarly, the K567R substitutions already men- 
tioned is within 3.81 A of the active-site lysine, and mutation to 
arginine may prevent modification that could sterically interfere 
with substrate binding or catalysis. 

Similarly, p-glucuronidase is active only as a tetramer"* '^ and 
lysines play a key role in its quaternary structure. To the extent 
that modifications of interfacial lysines disrupt quaternary struc- 
ture and enzymatic function, adjacent amino acid substitutions 
could render these lysines less reactive. In this regard, the loop 
containing D151N and three lysines is <5 A away from the a-helix 
of the adjacent subunit containing T480A and three other lysines. 
These amino acid substitutions could also prevent structural and 
functional disruption by independently increasing the affinity 
between the subunits. For example, the A2I9V substitution also 
maps to the other interface, although it is not immediately adja- 
cent to any lysines. 

Overall, it appears the surface chemistry of the enzyme has 
coordinately evolved either to cause lysines to be less reactive or to 
functionally accommodate covalent modification of lysines. Our 
results suggest that there may be multiple possible routes by which 
proteins could be adapted to function in a wide variety of fixatives 
or solvent systems. More importantly, they suggest a way of aug- 
menting protein chemistry by introducing amino acids with novel 
surface conjugates. 




Figure 5. Expression of GUS*** in Xenopus embryos. Embryos at the 
16- to 32-cell stage were Injected with 1 ng of mRNA encoding N358S 
(A) or GUS^'^ (B) and fixed two days later in 3.7% formaldehyde for 20 
min. GUS activity was detected using the chromogenic substrate X- 
gluc (light blue/green). The reddish-purple color of the cement gland 
of the embryo shown in (A) is from a natural pigment, and the blue 
color of the embryo shown in (B) is from GUS*" activity. 



NATURE BIOTECHNOLOGY VOL 17 JULY 1999 http://biotech.naturQ.com 



699 



ffi © 1999 Nature America Inc. • http://biotech.nature.com 

RESEARCH 




Figure 6. Multiple marker staining of Xenopus embryos. Embryos were co-injected with 0.5 
ng of lacZ mRNA and 1 ng of either N358S GUS (A, C) or GUS*" (B. D), and subsequently fixed 
in 3.7% formaldehyde for 20 min. Following fixation, embryos were reacted with X-gluc (light 
blue/green In all frames). As in Figure 5, embryos injected with mRNA encoding GUS*" (B) 
stained much more intensely than those injected with N358S GUS mRNA (A). All embryos 
were subsequently rinsed free of X-gluc and stained with the chromogenic substrate for 
galactosidase, rose-gal (red in C and D). 



GUS^^ as a universal reporter gene. Since most naturally 
occurring p-glucuronidases are likely to be fixative-labile, the fix- 
ative-resistant GUS^** we have isolated should prove useful for 
expanding the range and power of GUS staining techniques. In 
addition, it should be possible to develop methods for following 
multiple genes or cell lineages in parallel. Such methods generally 
rely on protocols in which fixed tissues are reacted with antibodies 
conjugated to dyes or reporter enzymes (for example, see ref. 24). 

Reporter genes are very commonly used in Xenopus as cell lin- 
eage tracers, and have proved important for gene expression stud- 
ies in developing embryos^^^^. Following mRNA microinjection, 
the fixative-resistant GUS could be specifically followed in 
Xenopus relative to both background activity and the wild-type 
reporter. Moreover, a lineage trace in tandem with p-galactosidase 
demonstrated the use of GUS in a multiple-reporter format. These 
experiments pave the way for the practical development of two- 
enzyme reporter systems, and could potentially be combined with 
a p-lactamase reporter system developed by Raz et alP to create 
three-enzyme reporter systems. 

The results in Xenopus embryos are notable in that that no spe- 
cial precautions were taken to enhance gene expression or enzy- 
matic activity. In contrast to the transformation of reporter con- 
structs, microinjected reporter mRNAs do not replicate and their 
dosage progressively decreases as messages are segregated or bro- 
ken down. Further, no attempt was made to increase the signal 
intensity of GUS^** by fusing it to a nuclear localization signal, as 
was the case for the lacZ control. Nor were fluorescent or other 
highly sensitive commercially available GUS substrates^ utilized. 
In short, the evolved enzyme is itself robust enough so that new 
staining techniques can easily be adapted from extant methods. 

Experimental protocol 

Materials. DNA-modifying enzymes, including restriction enzymes and 
Vent polymerase, were purchased from New England Biolabs {Beverly. 
MA). Deoxyribonuclease I was from GIBCO-BRL (Gaithersburg. MD). 



Taq polymerase was expressed and purified as 
described by Grimm and Arbuthnot^^. DNA 
sequencing kits were from Perkin-Elmer/Applied 
Blosystems (Foster City, OA). Cloning vector 
pGEM-5 was from Proniega (Madison, WI), 
pBluescript 11 SK(+) was from Stratagene (La JoUa, 
CA). and the regulatory vector pREP4 was from 
Qiagen (Chatsworth. CA). pCUS N358S was from 
Clontech (Palo Alto. CA). and pET28a{+) from 
Novagen (Madison. WI). DNA purification columns 
were purchased from Qiagen (Chatsworth, CA). X- 
gluc was from Gold Biotechnology (St. Louis, MO) 
and Butterfly nitrocellulose membranes from 
Schleicher and Schuell (Kecne, NH). The mMessage 
mMachine SPG in vitro mRNA transcription kit was 
from Ambion (Austin, TX). MicroSpin C-25 
Sephadex spin columns were from Pharmacia 
Biotech (Piscataway, NJ). Escherichia coli strain 
InvaF' was from Invitrogen (Carlsbad. CA). W31 10 
(ATCC No. 27325) from the ATCC (Rockville. MD), 
GMS407 from the E. co/]" Genetic Stock Center (New 
Haven, CT), and BL21 (DE3)pLysS from Novagen. 
Other chemicals, including glutaraklehyde. PNPG 
and NBT. were from Sigma Chemicals (St. Louis, 
MO). 

Cloning o^gusA. The E. coligusA gene was ampli- 
fied from W3110 cells using Vent polymerase and 
the primers 5'-CCGGATC£iaMiAGATGT- 
TACGTCCTGTAGAAACC-3' and 5'- 

GC GAATTCT CCAGTCATTGTTTGCCTCCCTGCT- 
3' [Xbal and £coRl sites underlined). The PGR 
product was blunt-end llgated into the EcoRW site of 
pGEM-5 by standard methods^^ Escherichia coli 
InvaF' cells were transformed by the method 
described by Inoue et al?^. The gusA gene was subcloned into pBluescript 
11 SK(+) using restriction endonucleases Xbal and £coRl. The nucleotides 
encoding the iacZ a-fragment that would normally have been located 
between the ribosome binding site and the gusA start codon were deleted by 
amplifying the remainder of the plasmid using primers 5'-CCGGATCC TC- 
TAGAGATGTTACGTCCTGTAGAAACC-3' and 5'-CG TCTAGA AGCT- 
GTTTCCTGTGTGAAATTG-3', digesting with Xbal. ligating, and trans- 
forming pREP4/lnvaF' (see below). The resultant construct placed the gusA 
gene under direct control of the lac promoter. The GUS expression vector was 
named gusA-pBSA, 

Library construction and screening. For the first round of screening, ran- 
dom mutations were introduced Into the cDNA by mutagenic PCR'^ using 
primers 5'-CCCAGTCACGACGTTGTAAAA CGACG-3' and 5'-ATGCTTC- 
CGGCTCGTATGTTGTGTGG-3', which anneal to the pBluescript II SK(+) 
vector outside of the boundaries of the gusA Insert. The amplification reac- 
tion was carried out with 100 nM primers, 60 mM Tris-HCl pH 8.5. 15 mM 
(NH4)S04, 3.2 mM MgClz, 0.125 mM MnCb. 0.2 mM dCTP. 0.2 mM dATP, 
0.4 mM dTTP, 0.4 mM dCTR for 35 cycles of 94'C x 30 s. 72X x 2 min. The 
^i/5y4-pBSA plasmid library was transformed into £. coli InvaF' cells harbor- 
ing the /ac/ expression vector pREP4. The plasmid was unstable when propa- 
gated In E. coli InvaF' without pREP4, probably because the lac repressor Is 
not present at high enough levels to limit expression of the gusA. For colony- 
lift assays, ^t/s/\-pBSA/pREP4/InvaF' colonies were propagated on liquid 
Luria Broth supplemented with 25 pg/ml kanamycin and 100 ng/mL ampi- 
cillin (LB-kan/amp) + 0.4% glucose plates for 12 h at 37''C. The colonies 
adsorbed to a nitrocellulose filter and were transferred colony side up to LB- 
kan/amp plates containing 0.5 mM Isopropyl f5-D-thiogalactopyranoside 
(IPTG), and induced at 3T*C for 12-24 h. The nitrocellulose-bound colonies 
were transferred to GUS buffer (50 mM sodium phosphate pH 7.0. 0.1% 
Triton X-100, 1 mM EDTA) containing 0.2% glutaraldehyde and incubated 
for 20 min at 23°C. The filters were then transferred to buffer containing 165 
pg/ml X-gluc and 330 pg/ml NBT and incubated for 10-30 min. The filters 
were incubated briefly in 4 M guanidine hydrochloride to arrest color devel- 
opment. Those colonies on the master plate corresponding to tiie darkest 
colony remnants on the filter were isolated and amplified. 

For subsequent rounds of screening, the alleles were randomly recom- 
bined and mutated by DNA shuffling as described by Stemmer'^. In short, 
the gusA variants were PGR amplified using the same primers as in the 
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mutagenic PCR reactions already described, partially digested with DNase 
I, and reassembled in a PCR reaction without primers. The products were 
amplified in a PCR with primers, then subcloned back into ^U5v4-pBSA for 
screening. The second and third rounds were carried out in the same way, 
except that 1.0% and 3.5% glutaraldehyde was used to fix the colonies before 
the incubation in X-gluc and NBT. The most resistant round 3 clone was iso- 
lated and sequenced at the University of Texas, Institute of Cellular and 
Molecular Biology Core Facility using the Applied Biosystems protocol, via the 
primers originally used for mutagenic PCR and two additional internal 
primers: 5'-CGCCGGGAATGGTGATTACC-3' and 5'-CTGATGGTATCGGT- 
GTGAGCG-3'. 

In vitro characterization of enzyme activity. For the preparation of lysates, 
^us/\-pBSA/pREP4/GMS407 cells were propagated at 37'C in LB-kan/amp. 
The gusA gene was induced by the addition of 0.5 mM IPTG to mid-log (ODgoo 
= 0.3) cultures, and the induced cultures were grown overnight. Cells were cen- 
trifuged, resuspended in distilled water, centrifuged again, and resuspended in 
GUS buffer. Cells were lysed with the addition of 10 mM EDTA and I mg/ml 
chicken lysozyme. The insoluble fraction was centrifuged down, and the alde- 
hyde resistance of the GUS in the supernatant was determined as follows. 
Glutaraldehyde or formaldehyde was added to an aliquot of supernatant and 
the mixture was incubated at 23X for exactly 20 min. The mixture was then 
diluted 1/100 into GUS buffer containing 0.5 mM PNPG. The hydrolysis of the 
substrate was followed for 1 min at 23°C at 405 nm in a Shimadzu UV-1601 
spectrophotometer. The absorption extinction coefficient of p-nitrophenol 
under these conditions was 11.50 mM'' cm''. The initial rates of hydrolysis 
were linear (data not shown). 

To generate purified GUS enzymes, the wild -type and evolved gusA genes 
were amplified by PCR with the primers: 5'-GCTCTAGAG£MMGT- 
TACGTCCTGTAGAAACC-3' and 5'-GC£AmLTGCAGTCATTGTTTGC- 
CTCCCTGCT-3' and subcloned into the expression vector pET28a(+) using 
the restriction enzymes Nde\ and EcoRl (sites underlined in primers). The 
resultant genes were sequenced as described already to confirm that no addi- 
tional mutations had been introduced during amplification or cloning. The 
expression constructs were transformed into BL21 (DE3)/pLysS. The trans- 
formed strains were propagated and induced, and the proteins purified by 
nickel chelate chromatography, as suggested by Novagen (Madison, WI). The 
protein preparations were judged to be >99% pure following SDS-PAGE and 
Coomassie Blue staining (data not shown). Purified protein concentrations 
were determined via Bradford protein assays (Bio-Rad, Hercules, CA). 

A 10 pmol quantity of purified GUS protein was preincubated for 20 min at 
23''C in 10 \x\ of GUS bufi'er (1 ^M) containing 0% or 0.04% formaldehyde. 
Then, 5 f^l of protein solution were added to 1 ml of buffer (5 nM) containing 
varying concentrations of PNPG, and the initial velocity of each reaction was 
determined as already described. The kinetic parameters of the wild-type and 
mutant enzymes were calculated by fitting the initial velocity values to the 
Michaelis-Menten equation using the application Kaleidagraph 3.0.5 
{Adelbeck Software, Reading, PA). 

Expression of gusA in Xenopus embryos. The GUS^" gene was subcloned 
into pGUS N358S; this placed the gene downstream of a Kozak sequence^", so 
that its transcript could be recognized by eukaryotic translation systems. 
N358S GUS and GUS^^ were subcloned into the Xenopus expression vector 
p64TS. This plasmid provides in vitro-transcribed mRNAs with Xenopus glo- 
bin 5'- and 3'-untranslated regions and greatly increases the amount of protein 
translated from the mRNA^'. Capped mRNA was produced by in vitro tran- 
scription^^ of the clones already described using the Ambion mMessage 
mMaker SP6 protocol. In vitro transcriptions were also treated with DNase I. 
and the mRNA was purified using a Sephadex G-25 spin column to minimize 
nonspecific toxicity effects. Purified mRNAs were resuspended in sterile water 
for injections. 

Female adult Xenopus were induced to ovulate with human chorionic 
gonadotropin, and eggs were fertilized in vitro. Embryos were dejellied in 3% 
cysteine solution and washed in 0.2x MMR^l Embryos were then reared at 
n-lS'C in 0.2x MMR. Microinjections were performed as described^. 
Embryos were fixed in MEMFA (0.1 M MOPS. pH 7.4 / 2 mM EGTA/1 mM 
MgS04/ 3.7% formaldehyde) for 20 min. and embryos were washed 5 x 5 min 
in Ix PBS. GUS activity was detected using 1 mg/ml X-gluc in a solution of Ix 
PBS / 20 mM potassium ferricyanide / 20 mM potassium ferrocyanide / 2 mM 
MgCl2 / 0.02% NP-40 at 37X for 2 h. p-Galactosidase was detected using the 
same buffer but substituting rose-gal for X-gluc. Injection experiments repeat- 
ed on different days with d inherent preparations of mRNA gave similar results 
(data not shown). 
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Introduction 

The sequence of the human genome is at hand. Most 
scientists who use the sequence will rely on annotations 
that provide information about the number and loca- 
tion of genes and about their inferred protein products. 
Traditionally, genes have been annotated by scientists 
with a particular interest in them. However, annota- 
tion of the complete human genome sequence will have 
to be at least partially automated. Gene annotation in- 
corporates cDNA data (including expressed sequence 
tags [ESTs]), sequence similarity, and computational pre- 
dictions based on the recognition of probable splice 
sites and coding regions (Stormo 2000; also see David 
Haussler's Web site, Computational Genefinding). The 
state of the art was recently surveyed by the Genome 
Annotation Assessment Project-GASPl and must be re- 
garded as imperfect (Bork 2000; Reese et al. 2000). 

This review enumerates aspects of pre-mRNA splicing 
that limit our ability to predict gene structure from ge- 
nomic sequence, drawing on the recently annotated 
complete genome of Drosophila melanogaster (Adams 
et al. 2000) as an example. In particular, the following 
four facts will be discussed. First, splice sites do not 
always conform to consensus. Second, noncoding exons 
are common. Third, internal exons can be arbitrarily 
small, and small internal exons confound not only gene 
finding but also the alignment of cDNA and genomic 
sequences. Fourth, splice sites are not recognized in iso- 
lation, and nucleotides that are far from splice sites can 
affect splicing. This list and the accompanying analysis 
should make molecular geneticists aware of the ways 
in which gene annotations can be wrong and should 
encourage recourse to the primary data. In addition, the 
same considerations indicate that inherited disease can 
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be caused by mutations remote from splice sites that 
nevertheless affect splicing. 

Discussion 

Splice Sites Do Not Always Conform to Consensus 

It is well established that nearly all splice sites conform 
to consensus sequences (Mount 1982; Senapathy et al. 
1990; Zhang 1998). These consensus sequences include 
nearly invariant dinucleotides at each end of the in- 
tron — GT at the 5' end of the intron and AG at the 3' 
end of the intron. Most gene-finding software and most 
human annotators will find only introns that begin with 
a GT and end with an AG. However, nonconsensus 
splice sites have been described, and I will discuss three 
classes, in decreasing order of frequency. 

The most common class of nonconsensus splice sites 
consists of 5' splice sites with a GC dinucleotide. Sen- 
apathy et al. (1990) listed 17 examples among 3,724 5' 
splice sites, suggesting a frequency of ^0.5%. Jackson 
(1991) listed a total of 26 GC sites, whereas Wu and 
Krainer (1999) cited an additional 18 examples. GC 5' 
splice sites are consistent with the experimental obser- 
vation that, of the six possible point mutations within 
the GT dinucleotide, mutation of T to C in position 2 
has the smallest effect on in vitro splicing (Aebi et al. 
1986). At other positions within the consensus, GC sites 
conform extremely well to the standard consensus; for 
example, 42 of the 44 sites cited above have a consensus 
G residue at both position —1 and position +5, It is 
reasonable to assume that GC sites are recognized by 
the standard (U2-dependent) spliceosome. 

The second class of exception to splice-site consensus 
is U12 introns, a minor class of rare introns with splice- 
site sequences that are very different from the standard 
consensus but that are very similar to each other. The 
existence of this class was first pointed out by Jackson 
(1991) and was considered in more detail by Hall and 
Padgett (1994). It was subsequently discovered that U12 
introns are removed by a minor spliceosome containing 
the rare Ull, U12, U4atac, and U6atac snRNPs, in place 
of Ul, U2, U4, and U6 (Tarn and Steitz 1997; Burge et 
al. 1998), Some U12 introns have AT and AC in place 
of GT and AG and are known as "AT-AC" introns. 
However, terminal intron dinucleotide sequences do not 
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distinguish between U2- and U12-dependent introns 
(Dietrich et al. 1997). Rather, U12 introns can be iden- 
tified by highly conserved sequences at the 5' splice site 
(RTATCCTY; R = A or G, and Y = C or T) and branch 
site (TCCTRAY). U12 introns are found in many eu- 
karyotes, including Drosophila melanogaster (Adams et 
al. 2000) and Arabidopsis thaliana (Shukla and Padgett 
1999) but not Caenorhabditis elegans. 

Finally, there are a small number of nonconsensus sites 
that fit into neither of the two categories mentioned 
above. Many reports of such variant splice sites can be 
traced to errors in annotation or interpretation, poly- 
morphic differences between the sources of cDNA and 
genomic sequence, inclusion of pseudogene sequences, 
or failure to account for somatic mutation (author's un- 
published data; for examples, see Jackson 1991). How- 
ever, there are many examples of sites that match the 
consensus very poorly, and experimental work has es- 
tablished that 5' splice sites do not absolutely require 
GT — and that 3' splice sites do nor absolutely require 
AG — in order to be recognized in vivo (Aebi et al. 1986; 
Roller et al. 2000, and references therein). In yeast, an 
intron that is within the HACl mRNA and that has no 
similarity to the standard nuclear pre-mRNA intron con- 
sensus sequence is spliced by a specific, regulated, en- 
donuclease and tRNA ligase (Sidrauski et al. 1996). This 
intron provides a precedent for introns in protein-coding 
genes with completely novel splice sites. 

Noncoding Exons Are Common 

There is considerable confusion between exons and 
coding regions. The term "exon" was coined by Gilbert 
(1978) to refer to what is left when introns are removed 
by splicing, and RNAs that are entirely noncoding (such 
as tRNAs) are sometimes spliced. However, the term 
exon is often misused to refer to a stretch of coding 
information. In reality, however, noncoding exons are 
quite common, occurring in >35% of human genes 
(Zhang 1998). Gene-finding software generally detects 
sequence features characteristic of coding regions rather 
than of exons and does not even attempt to identify 
noncoding exons, or noncoding portions of exons. This 
is because the statistical biases introduced by protein- 
coding are in fact a very powerful tool for the identifi- 
cation of coding DNA, and no similar tool has been 
developed for the identification of noncoding exons. 

A similar problem can arise in genes without non- 
coding exons. If the first intron occurs near the initiator 
AUG, then the coding information in the first exon can 
be very short and difficult to identify by measures of 
coding tendency. Furthermore, the first intron tends to 
be longer than average (Maroni 1996), and such an ar- 
rangement can separate promoter function (perhaps in- 
cluding downstream transcriptional enhancer elements 
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lying in the first intron) from the bulk of the coding 
information downstream. In these cases, investigators 
have no way of knowing how much information is miss- 
ing — or where the 5' end of the gene is likely to re- 
side — without experimental data such as a cDNA se- 
quence or a 5' EST 

Interna! Exons Can Be Arbitrarily Small 

A less frequent but perhaps more serious problem for 
gene-discovery methods is posed by small internal exons. 
Vertebrate internal exons have an average size of--130 
nucleotides (Hawkins 1988; Zhang 1998), and roughly 
65% of internal human exons are 68-208 nucleotides 
in length (Maroni 1996). This size distribution reflects 
a functional constraint. Optimal splicing efficiency re- 
quires exons with sizes of -50-300 nucleotides (Rob- 
berson et al. 1990; Dominski and Kole 1991; see re- 
view by Berget 1995). However, a considerable number, 
>10%, of exons are <60 nucleotides in length, and it is 
these exons that can be difficult to identify by measures 
of coding tendency. 

Just how small can internal exons be? There appears 
to be no lower limit, and many cases of exons <10 nu- 
cleotides have been described (for examples, see Stamm 
et al. 1994; also see the author's Web site. Gene An- 
notation and Splice Site Selection). An illustrative case 
is the iiwected gene of D. melanogaster (also listed in 
GadFly as CGI 7835). This gene encodes a homeodo- 
main protein that is similar to engrailed, and these two 
genes are adjacent. One of four invected exons is only 
6 nucleotides long and is flanked by introns of 27,659 
and 1,134 nucleotides. Significantly, this exon is not rec- 
ognized by cDNA alignment software such as S1M4 (Flo- 
rea et al. 1998), and the gene is incorrectly annotated 
(GenBank accession number AE003825.1). As a result, 
the protein sequence predicted by annotation of the 
genome (Adams et al. 2000; GenBank accession num- 
ber AAF58640) differs from that predicted from the 
cDNA (Coleman et al. 1987; GenBank accession number 
CAA28885), because of a frameshift affecting the entire 
carboxyl-terminal coding exon, a highly conserved re- 
gion of the protein. This is despite the fact that the mi- 
croexon sequence, GTCGAA, is flanked by intron se- 
quences that perfecdy match the splice-site consensus. 
Use of this microexon provides perfect agreement be- 
tween the cDNA and genomic sequences when consen- 
sus splice sites are used, whereas the annotation predicts 
an RNA with several discrepancies relative to the cDNA. 
The frameshift is due to the predicted use of a S' splice 
site 10 nucleotides downstream of the true 5' splice site, 
which was apparendy selected to account for the mi- 
croexon. It seems clear that the protein sequence pre- 
dicted by the cDNA is correct. Why was it not incor- 
porated into the annotation? The alignment problem 
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arises because a pattern-matching algorithm that locates 
exons by similarity between the cDNA and the genomic 
sequence cannot find exons of this size unless its strin- 
gency is reduced to an unacceptable level (Florea et al. 
1998). 

The notion that exons can be arbitrarily small is sup- 
ported by the observation of exons with length 0. Of 
course, such sites are not exons at all but, rather, are 
resplicing sites (see fig. 1). This phenomenon has been 
demonstrated in the case of the Drosophila Ultrabi- 
thorax locus (Hatton et al. 1998), which has a region 
of 60 kb containing two alternatively spliced exons, and 
may be a general feature of long introns (J. Burnette and 
A. J. Lopez, personal communication). The existence of 
resplicing sites not only illustrates the lack of a lower 
limit to exon size (which has implications for gene an- 
notation) but also has implications for the analysis of 
hereditary mutations. A mutation at one of these sites 
could potentially create a frozen intermediate such as 
that diagrammed in figure 1. This partially spliced RNA 
would probably be unstable, because of nonsense-me- 
diated decay (Culbertson 1999), and the apparent result 
would be no RNA (rather than aberrantly spliced RNA). 
Such mutations would be very hard to identify. 

Nucleotides Far from Splice Sites Can Affect Splicing 

No method of evaluating potential splice sites that is 
based on sequence alone can be 100% reliable. One can 
be sure of this because many sequences that are not splice 
sites are capable of acting as splice sites, and vice versa. 
Perhaps the clearest demonstration of this is provided 
by the activation of cryptic splice sites. These are splice 
sites that are used, sometimes with 100% efficiency, 
when a natural splice site has been mutationally inac- 
tivated. The activation of cryptic sites occurs in ap- 
proximately one-third of splicing mutations (Nakai and 
Sakamoto 1994). The phenomenon shows that the cryp- 
tic sites are perfectly capable of being recognized by the 
splicing machinery. Clearly, the sequence of such cryptic 
sites is compatible with splicing, and context is impor- 
tant for splice-site choice. 

Two contextual elements that contribute to splice- 
site selection are the location of splice sites relative to 
each other and splicing-enhancer sequences. The exon- 
size preferences described above are widely understood 
in terms of an exon-definition model that includes the 
interaction of splicing factors bound at either end of an 
exon (Berget 1995). The requirement for productive in- 
teractions among splicing factors, including Ul snRNPs 
at the 5' splice site and U2 snRNP auxiliary factor 
(U2AF) at the 3' splice site, are thought to give rise to 
preferred exon lengths because of steric constraints and 
geometry favoring interactions. In the case of small in- 
trons, a similar model of intron bridging has been pro- 
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Figure 1 Small internal exons and resplicing. This schematic 
figure indicates the pathway of resplicing demonstrated for the Dro- 
sophila Vbx locus (Hatton et al. 1998). The thicker vertical line in- 
dicates a resplicing site, which does not contribute any nucleotides to 
the final mRNA product. The same pathway could be followed in the 
case of a microexon, in which case an arbitrarily small number of 
nucleotides would remain in the mRNA product. "Up. Exon" and 
"Down. Exon" denote the exons upstream and downstream of the 
resplicing site, respectively. In the case of Ubx, the sequence imme- 
diately downstream of the resplicing site is an alternatively spliced 
exon (here designated "Alt. Exon"), but resplicing sires are not always 
accompanied by such alternatively spliced exons (J. Burnette and A. 
J. Lopez, personal communication). 

posed (Guo and Mount 1995; McCullough and Berget 
1997). In combination, these models suggest that, in 
order to be recognized, a splice site must have a partner 
an appropriate distance away, so that either exon defi- 
nition or intron definition is facilitated by the spacing. 
One experimental distinction between exon definition 
and intron definition is the result of mutations that in- 
activate the splice site. Failure to undergo exon definition 
results in exon skipping, whereas failure to undergo in- 
tron definition results in intron retention. 

Not only is the use of one splice site dependent on the 
presence of its partner across the exon, but weakness in 
one partner can be compensated by strength in the other, 
as seen with second-site revertants of splice-site muta- 
tions that cause exon skipping. In an analysis of splicing 
mutations at the dihydrofolate reductase locus, Caroth- 
ers et al. (1993) found that a mutation at the 5' splice 
site of exon 5 (G to C in the third position of the intron) 
could be partially reversed by mutations that increased 
the strength of the 3' splice site upstream of the same 
exon (AAAGI to TTAG|, ACAG|, or ATAG|). Al- 
though reversion was not complete, these data provide 
a strong argument that whether a sequence functions as 
a splice site depends not only on its intrinsic strength 
but also on its context. Similarly, there are mutations 
that create splice sites within introns, activating cryptic 
exons by recruitment of appropriately placed partners 
(e.g., see Bagnall et al. 1999). 

Splicing enhancers are sequences that stimulate 
splicing at nearby sites. A family of non-snRNP splic- 
ing factors known as "SR proteins" appear to be im- 
portant for the recognition of splicing enhancers in 
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exons (Blencowe 2000). A splicing difference between 
SMNl and SMN2, which explains their differential 
effects on spinal muscular atrophy, has been attrib- 
uted to a translationally silent substitution within the 
coding sequence that affects splicing (Lorson et al. 

1999) . Similarly, H.-X. Liu, L. Cartegni, M. Q. 
Zhang, and A. R. Krainer (personal communication) 
have shown that a nonsense mutation causing the 
skipping of BRCAl exon 18 affects splicing in vitro 
and that a missense mutation at the same position can 
also cause exon skipping. There are also splicing-en- 
hancer sequences in introns — and examples of mu- 
tations that affect them (Cogan et al. 1997). Although 
general mechanisms for their function have yet to be 
defined, there is some evidence that at least some splic- 
ing enhancers in introns may act by facilitating exon 
definition in the case of small exons (Carlo et al. 

2000) . 

Outlook 

This review has presented aspects of pre-mRNA splicing 
that pose special problems for gene annotation. How- 
ever, even though the best gene finders predict genes 
exactly right less than half the time, 95% of total coding 
nucleotides are predicted accurately, and <5% of genes 
are completely missed (Reese et al. 2000; Genome An- 
notation Assessment Project-GASPl). When cDNA and 
homology data are available, annotations will tend to 
be even better. Thus, one would be wrong to conclude 
from this review that the gene annotations attending the 
human genome sequence will not provide an extremely 
valuable resource. Nevertheless, molecular geneticists 
will want to have an understanding of the kinds of errors 
that are likely to occur — and to carefully review the 
available evidence for genes that matter to them. An- 
notators are likewise obligated to make the source of 
each specific aspect of their annotation an integral part 
of the annotation; for example, if part of the annotation 
is supported by a EST whereas the rest of it is based on 
the prediction of a gene finder, then the limits of the 
cDNA should be indicated, and the accession number 
of the EST should be part of the annotation. 

A related but distinct point is that these same factors 
are also relevant when candidate mutations are evalu- 
ated during the analysis of hereditary disease. Mutations 
that lie within spHcing enhancers, at resplicing sites, or 
at cryptic splice sites can affect splicing even when they 
lie some distance from the splice sites actually used in 
the generation of the affected mRNA. The problem is 
further compounded by alternative splicing and the in- 
terplay between splicing and polyadenylation, topics 
that are beyond the scope of the present review. 

In summary, gene annotations will be a valuable re- 
source. However, they will not substitute for expertise 
in molecular genetics. 
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Abstract; 

We have examined codon bias in 207 plant gene sequences collected from Genbank and 
the literature. When this sample was further divided into 53 monocot and 154 dicot genes, 
the pattern of relative use of synonymous codons was shown to differ between these 
taxonomic groups, primarily in the use of G+C in the degenerate third base. Maize and 
soybean codon bias were examined separately and followed the monocot and dicot codon 
usage patterns respectively. Codon preference in ribulose 1,5 bisphosphate and chlorophyll 
a/b binding protein, two of the most abundant proteins in leaves was investigated. TTiese 
highly expressed are more restricted in their codon usage than plant genes in general. 

Introduction; 

With the exception of Met and Trp, all amino acids are encoded by two to six 
synonymous codons. In the majority of species studied to date, an organism's use of 
synonymous codons is not random (1-5). However, detailed characterization of specific 
patterns of codon usage have been reported primarily for unicellular organisms, including E. 
coll (6-8), Bacillus (9,10), Agrobacteriwn (11) and yeast (12-17). The pattern of codon usage 
in higher eukaryotes has been examined in only a limited number of species, including 
Drosophila (19) and man (13, 18-21). 

In the last three years, a large niunber of DNA sequences of higher plant genes have 
been reported, enabling us to extend the initial analyses of plant codon usage previously 
reported (13, 18, 22,23). We have used an expanded sample of 207 plant gene sequences to 
examine some general observations about codon usage. 

In general, genes within a taxonomic group exhibit similarities in codon choice, 
regardless of the function of these genes. Thus an estimate of the overall use of the genetic 
code by an taxonomic group can be obtained by summing codon frequencies of all its 
sequenced genes. This species-specific codon choice has been called a "codon dialect" by 
Ikemura (13). Here we report on the "codon dialect" of 207 plant genes. We have broken 
this sample down into monocotyledonous and dicotyledonous plants to determine whether 
these broader taxonomic groups are characterized by different patterns of synonymous 
codon preference. Finally, we report the codon dialect of maize and soybean, since over 25 
genes have been sequenced in each of these agronomically im]x>rtant species. 

Bias in codon choice within genes in a single species appears related to the level of 
expression of the protein encoded by that gene (6, 7, 12-18). Codon bias is most extreme in 
highly expressed proteins of E. coli and yeast. In these organisms, a strong positive 
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correlation has been reported between the abundance of an isoaccepting tRNA species and 
the favored synonymous codon. In one group of highly expressed proteins in yeast, over 
96% of the amino acids are encoded by only 25 of the 61 available codons (15). These 25 
codons are preferred in all sequenced yeast genes, but the degree of preference varies with 
the level of expression of the genes. Recently, Hoekema and colleagues (24) report that 
replacement of these 25 preferred codons by minor codons in the 5* end of the highly 
expressed yeast gene PGKl results in a decreased level of both protein and mRNA, They 
conclude that biased codon choice in highly expressed genes enhances translation and is 
required for maintaining mRNA stability in yeast. The degree of codon bias may be a factor 
to consider when engineering high expression of heterologous genes in yeast and other 
systems. 

In plants, the ribuJose 1,5 bisphosphate small subunit (RuBPC SSU) and chlorophyll a/b 
binding protein (CAB) gene families encode two of the most abundant proteins in leaves. 
These genes have been sequenced in a number of diflferent plants, and we have examined 
their codon usage in detail to determine whether they are more biased than other plant 
genes. 

Data and Methods: 

The 207 plant genes included in the sample are detailed in Tables 1 and 2. Sources for 
the data were Genbank (release 55) or original publications, referenced in Tables 1 and 2 
and listed in the Appendix. Genes for which only genomic sequence was available were 
included, and introns were removed before codon usage data was generated. Partial 
sequences available for some genes were included when reading frame could be determined. 

Homologous genes in different species, or multigene gene families within a species have 
been sequenced, including those for zetn, glycinin, vicilin, CAB and RuBPC SSU. Multiple 
sequences of these genes were included in the sample if they differed by a minimum of 10% 
in the base composition of their protein coding regions. As a result, this sample may contain 
some bias towards codon usage in highly expressed genes. 

Genbank sequences were extracted using the GENBANK program (25). Codon usage 
tables were compiled using the program CODONFREQUENCY from the program library 
of the University of Wisconsin Genetics Computer group (26). 

Results; 

Plant genes coding for protems with a wide variety of functions have now been 
sequenced. We have tabulated the sequences of 207 plant genes (Tables 1 and 2) from 6 
monocot and 36 dicot species. These proteins are present in a wide range of plant tissues at 
varying levels of expression. However, to date, only a few plant genes encoding rare 
proteins and/or mRNAs have been sequenced. 

We have calculated the codon usage profile of the pooled plant sample and separate 
codon usage profiles for the monocotyledonous and dicotyledonous groups of plants (Table 
3). This division reveals that the relative use of synonymous codons differs between the 
monocots and the dicots. Since the monocot sample is one third smaller than the dicot 
sample, we were concerned that the relative abundance of storage protein genes could skew 
its codon usage profUe. Accordingly, we calculated codon usage in the monocots without 
these genes (Table 3). 
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TaWcl 

Descripcions and sources of 53 monocot genes included in the analysis. 
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RuBPC small subunit 


24 


Sccale ctrrale 


RYESECGSR 


Tsecalin 


25 


Zea mays 


MZEAIG 


40.1 kD Al protein (NADPH- 


26 






dependent reductase) 






MZEACnC 


Actin 


27 




MZEADHllF 


Alcohol dehydrogenase 1 


28 




MZEADIUNR 


Alcohol dehydrogenase 2 


28 




M2EALD 


Aldolase 


29 




MZEANT 


ATP/ADP translocaK>r 


30 




MZEEG2R 


Glutelin 2 


31 




MZEGGST3B 


Glutathione S transferase 


32 




MZEH3C2 


Histone 3 


33 




MZEH4C14 


Htstone4 


34 




MZEHSP701 


70 kD Heat shock protein, exon 1 


35 






70 kl) HftAt shcvk niotein evoii 2 


35 




MZELHCP 


CAB 


36 




MZEMPt) 


Lipid body surface protein L3 


37 




MZEPCFCR 


Phosphoenolyiuvate carboxylase 


38 




MZERBCS 


RuBPC smaU subunit 


39 




MZESUSYSG 


Sucrose synthetase 


40 




MZETP12 


Triocephocphate isomerase 1 


41 




MZEZEA20M 


19kDzein 


42 




MZEZEA30M 


19kDtein 


42 




MZEZEUA3 


15kDzein 


43 




MZEZEK 


16kDzein 


44 




MZEZE19A 


l9kDzetn 


45 




MZEZE22A 


22kDzein 


46 




MZEZE22B 


22kDzetn 


46 






Catalase 2 


47 






Regulatory CI locus 


48 



Data was obtained from GenBank (release 55) or, when no Gcnbank file name is spedfied, directly from 
the published source. 

In general, the most important factor in discriminating between monocot and dicot 
patterns of codon usage is the percentage G+C content of the degenerate third base. In 
monocots, 16 of 18 amino acids favor G+C in this position, while dicots only favor G + C in 
7 of 18 amino acids. 
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Tabic 2 

Descriptions and sources of 154 genes dicot included in the analysis. 



GENUS/ SPECIES 


GENBANK 


PROTEIN 


REF 


^ MfnviiiftuMf fM/riuc 


AMACHS 


Chatconc synthetase 


49 


Atxxbidopsis tholiorux 






50 






51 




ATnnvn 


nisionc <7 ^ 


51 




AitU14ivA 


Histonc 4 1 


51 




ATHIJICPl 


CAB 


52 






at tubulin 


53 






^■^noipynivyHniiaic >-pnu5pndic jh 








synthetase 


55 


BcnhcUetia acelsa 




Hi^ methionine storage protein 


Brassica campestris 




Acyl corner protein 


56 


Bntssica napus 


DNAMAr 


Napin 


57 


Bntssica oteacea 




S*locus specific glycoprotein 


5g 


douMjlic ensifomtis 




Concan&valin A 


59 


Carica papaya 


CPAPAP 


Papain 


w 


Qtiamdomonas 








ftinhardtii 


CREC552 


Prcapocytoch rome 


At 
01 




CRERBCSl 


RuBPC small subunit gene 1 






CRLRBCSZ 


RuBPC small subunit gene 2 


62 


Cucurtim ptpo 


CUCPHT 


Phytochiome 


OJ 


Cucumis saavus 


CUSCMS 


Gl^xofiomal malate synthetase 










65 




cusssu 


RuBPC small subunit 


OkJ 


Daucus carvta 




cjoeiviii 


66 




T\A DL'V'IV 


33 kO cxtensin related protein 


67 


DoUchos biponts 


DBILECS 


seed lectin 


Aft 


FUii>eria oinervia 


FTRBCR 


RuBPC small subunit 


oy 


Gfydne max 


M/l7aAA 


73 storage protein 


70 




SOYACTIG 


Actin 1 


£/ 




SOYCIIrl 


CI I protease inhibitor 


71 
/I 




cnv/^Y VA t A 

SOYGLYAIA 


Glycinin Ala Bx subunits 


77 
ti 




SOYGLYAAo 


Glycinin A5A4B3 subunits 


TX 

fj 




SOYGLYAB 


Gl)cinin A3/b4 subunits 


7/1 




SOYGLYk 


Glycinin A2Bla subunits 


/3 




SOYHSPITS 


Low M W heat shock proteins 


to 




SOYLGBI 


Lcghemoglobin 


77 




SOYLEA 


Lectin 


78 




SOYLOX 


Lipoxygenase 1 


79 




SOYNOD20G 


20 kDa nodulin 


80 




SOYNOD23G 


23 kDa nodulin 


81 




SOYNOD24H 


24 kDa nodulin 


82 




SOYNODUB 


26 kDa nodulin 


83 




SOYNOD26R 


26 kDa nodulin 


84 




SOYNOD27R 


27 kDa nodulin 


S3 




SOYNOD35M 


35 kDa nodulin 


86 




SOYNOD75 


75 kDa nodulin 


86 




SOYNODRl 


NoduUn CSl 


87 




SOYNODia 


Nodulin E27 


87 




SOYPRPl 


Proline rich protein 


88 




SOYRUBP 


RuBPC small subunit 


89 




SOYURA 


Urease 


90 




SOYHSP26A 


Heat shock protein 26A 


91 






Nuclear-encoded chloroplast 


92 






heat shock protein 








22 kDa nodulin 


80 






0\ tubulin 


93 






ffl tubulin 


93 


Gossypium hasumm 




Seed a globulin (violin) 


94 






Seed globulin (vicilin) 


94 


Hetiamhus annus 


HNNRUBCS 


RuBPC small subunit 


93 






2S albumin teed stoivge protein 


96 


Ipomoea bauuas 




Wound-induced cats Use 


97 


Lemnagibba 


LGIAB19 


CAB 


98 


LGIR5BPC 


RuBPC small subunit 


99 


Lupinus Itueus 


LUPLBR 


leghemoglobin I 


100 
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GENUS/ SPECIES 


GENBANK 


PROTEIN 


REF 


Lycopffsicon 






101 


e$0ilfnxutn 


TOMBIOBR 


Biotin binding protein 




TOMETHYBR 

ft X'iTftSi ft ftft ft ft#ft% 


EthylcDc biosynthcstt protein 


102 






PotygiUscturODftse-2ft 


103 




TOMPSI 


Xonuto pbotocystem I protein 


104 




TOMRBCS4 


RuBPC snail cubunit 


105 






RuRPP cnull cubunit 


105 






RuBPC small cubunit 


105 




TOMRBCSD 


RuBPC small subuntt 


106 






Ripening related protein 


107 




TOMWIPIG 


Wound induced proteinase 


108 






inhibitor 1 






TOMWIPII 


Wound induced proteinase 


109 






inhihitor 11 








CAB lA 


110 






CAB IB 


110 






CAB3C 


110 






CAB4 


111 






CABS 


111 


mwOiCugO JUUMU 




l^vhetnofflobin III 


112 










/WICM/Z/ftlifM 
fJfyMUUifMmfJt 




RuBPC small subunit 


113 










ptutnbaffiufolUi 




iviiiwnviiwnu /% A r ftjnuiuc 


114 














Nitrate reductase 


115 








116 


Nicotio/us toifocunt 


TOBECH 


Bndochitinase 


117 




TOBGAPA 


A subunit of chloroplast G3PD 


118 




TOBGAPB 


B subunit 6[ chloroplast G3P0 


118 




TOBGAPC 


C subunit of chloroplast G3PD 


118 




TDBPRIAR 


Pathosenesis related orocein la 


119 




TDBPR1CR 


Patbogeoesi^rclated protein Ic 


119 




TOBPRNl 


Pathogenesis related protein lb 


120 






Peroxxdasc 


121 






RuRPf* cmatl suhunit 


122 




TOBTHAUR 
A\jD Ann 


IT^fV'induced protein homologous 


123 






to ttuu Rutin 




r€TStUS WtUnCOtUl 






124 


Peooseiinum 








hoftenst 


PHOCHL 


LJiaieopc synuHuc 


125 


Petunia sp. 


wiv 'Ann 


PAR M 


126 


rElVAJIZZLi 




126 






CAB22R 


126 






CAB 25 


126 








127 




PEFCAB91R 


PAR 01 R 


127 




FETCHSR 


Cbaloooe sj^tbase 


128 




PETGCRl 


vnjCUic>nni protcui 


129 




PEIRBCSW 


RuBPC small subunit 


130 




F&rRBCSll 


RuBPC small subunit 


iJU 






70 kOa beat shock protein 


131 


PHastolus wlffffis 


PHVCHM 


Chitinaac 


132 


PHVDLECA 


Phytobcnkagglutinin E 


133 




PHVDLECB 


Pt^rtohemagglutinin L 


133 






UlUlBIIllDB »jMl>ivlK^ k 


134 




PHVGSR2 


Qucamine synthetase 2 


134 




PHVLBA 


Legherooglobtn 


135 




PHVLECr 


Lectin 


136 




PHVPAL 


Phenylalanine ammonia lyase 


137 




PHVPHASAR 


dphaseotin 


138 




PHVPHASBR 


^phaseoUn 


138 






Arcelin seed protein 


139 






Chakooe synthase 


140 


PisumsatMtm 


PEAALB2 


Seed albumin 


141 




PEACABSO 


CAB 


142 




PEAGSRl 


Glutamine synthetase (nodule) 


143 
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GENUS/ SPECIES 


GENBANK 


PRtriiiiN 


REF 






Lectin 


144 






l^ltUlIlllI 


145 






QiiRP^ cmnll ciiKiinit 


146 




rcAVICZ 


viciiin 


147 




PEAVIC4 


Vicilin 


147 






Vtcilin 


147 








148 






uiuiBiniiic synuiciasc 


143 






vjluiBniiiic syninciBsc ^roui^ 


143 






Htstone 1 


149 






tNUCICai cncouco CfllorapiaSl 


92 






ncii MiocK proicin 




Raphanus sadvus 




KUDrvr small suDunii 


150 


Ridnus communis 


RCCAGG 


Agglutinin 






RCCRICIN 


Rkin 








Isocitnitc lyssc 


153 


SiUne pratensis 


blfrllA 


FcfTodoxin prccufsor 


154 






lt^9ff#/^^M Mill n^i^^iiffVOr 

riaSiocyanin pTccureur 


155 


Sinapisalba 


aALUArllH 


Nuclcdf gene for G3PD 


156 


Solanum tuberosum 


rVTrAl 


Pataun 


157 




rUTINHWl 


wounO'lnuucca piuicinaic 


158 






inhibitor 








Light'induciblc tissue specific 


159 






oi-Lol gene 








wounO'inuucca pivicinosc 


160 






inniDiior 11 








KUDrv. SITUII SUDUnil 


161 






Sucrose synthetase 


162 


Spinocia oiaacea 


SPIACPI 


Acyl carrier protein 1 


163 


SPIOEC16 


16 kDa photosynthetic 


164 






oxygen-<voWing protein 






SnOEC23 


23 kDa photosynthetic 


165 






oxygen-evoKing protein 






SPIFCG 


Ptastocyanin 


165 




snpsM 


33 kOa photosynthetic water 


166 






oxidation complex precursor 








Qlycolate oxidase 


167 


Vkiafaba 


VFALBA 


Leghemoglobin 


168 




VFALEB4 


Lcgumin B 


169 






Vidlltn 


170 



Data was obtained from GenBank (release 55) or, when no Genbank file name is specified, directly from the 
published source. 

Tabic 3 



Codon usage in pooled sequences of higher plant genes. 



















MODOCOtS 


















No Stomge 






PImU 


DkoU 


MODOOOU 


Protdns 






D a 207 


154 


D = 


S3 


n = 39 


AmAdd 


Codoo 


No. 


% 


No* 


% 


No. 


% 


No. 


% 


Gly 




«1 


15 


449 


12 


282 


21 


267 




Gly 


GGA 


1629 


52 


1399 


38 


230 


17 


193 


16 


Gly 


GGT 


1477 


29 


1231 


34 


246 


18 


207 


17 


Gly 


GGC 


1179 


24 


596 


16 


583 


44 


543 


45 


Glu 


GAG 


2102 


57 


1498 


51 


604 


75 


568 


79 


Glu 


GAA 


1616 


43 


1419 


49 


197 


25 


154 


21 


Asp 


GAT 


1458 


50 


1269 


58 


189 


27 


162 


24 


Asp 


GAG 


1441 


50 


927 


42 


514 


73 


503 


76 


Val 


GTG 


1354 


31 


956 


29 


398 


36 


338 


37 


Val 


GTA 


491 


11 


402 


12 


89 


8 


52 


6 


Val 


GTT 


1478 


34 


1270 


39 


208 


19 


154 


17 


Val 


GTC 


1045 


24 


642 


20 


403 


37 


362 


40 
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AmAcid Codon 


Plants 
No. % 


DicoU 
No. % 


MODOCOlS 

No. % 


Monocots 
No Storage 
Protlios 

No. % 


Ala GCG 
Ala OCA 
Ala GOT 
Ala GCG 


546 U 
11S6 22 
1901 37 
1548 30 


211 6 
916 25 
1530 42 
960 27 


335 22 
240 16 
371 24 
588 38 


284 24 
137 12 
254 21 
510 43 


Arg AGG 
Arg AGA 
Scr AGT 
Scr AGO 


742 26 
707 24 
581 13 
887 20 


540 25 
633 30 
493 14 
605 18 


202 26 
74 9 
88 8 

282 26 


163 25 

50 8 

51 6 
225 27 


Lys AAG 

1 v« AAA 

Asn AAT 
Asn AAC 


2241 66 

1137 41 
1646 59 


1600 61 

IMA 10 

982 45 
1188 55 


641 86 

1U5 14 

155 25 
458 75 


609 86 

TO 14 

106 23 
356 77 


Met ATG 

1l» ATA 
He AIA 

lie ATT 
lie ATC 


1356 100 
jvj Id 
1241 40 
1374 44 


982 100 
4iy lo 
1051 45 
873 37 


374 100 
oo 11 
190 24 
501 65 


311 100 
48 8 
128 21 
433 71 


Thr ACG 
1 nr Ai^A 
Thr ACT 
Thr ACC 


343 U 

990 31 
1082 34 


184 8 
o3o 27 
842 35 
721 30 


159 21 
109 14 
148 19 
361 46 


146 22 
73 11 
116 18 
319 49 


Trp TGG 
End TGA 
Cy$ TGT 
Cys TGC 


790 100 
68 33 
432 40 
647 60 


605 100 
50 33 
338 44 
423 56 


185 100 
18 34 
94 30 

224 70 


171 100 
15 37 
69 27 

185 73 


End TAG 
End TAA 
Tyr TAT 
Tyr TAC 


48 24 
88 43 
743 37 
1267 63 


29 19 
72 48 
630 43 
838 57 


19 36 
16 30 
113 21 
429 79 


n 27 
14 35 
68 16 
354 84 


Leu TTG 
Leu TTA 
Phc TTT 

Phc rrc 


118S 22 
412 8 
1047 40 
1597 60 


965 26 
363 10 
887 45 
1106 55 


220 14 
49 3 
160 25 
491 75 


108 9 
19 2 
95 20 

392 80 


Scr TCG 
Scr TCA 
Scr TCT 
Scr TCC 


343 8 
768 17 
1009 22 
896 20 


192 6 
649 19 
844 25 
621 18 


151 14 
119 11 
165 15 
275 26 


139 17 
67 8 
112 13 
237 29 


Arg CGG 

Aro CCtA 

Arg COT 
Arg COC 


198 7 
214 7 
534 18 
520 18 


95 5 
181 8 
441 21 
241 11 


103 13 

93 12 
279 36 


94 14 

67 10 
268 40 


Gin CAG 
Oln CAA 
His CAT 
His CAC 


1465 43 
1912 57 

575 48 
625 52 


787 41 
1123 59 
465 54 
398 46 


678 46 

TOT CA 

7o7 54 
110 33 
227 67 


457 60 
305 40 
85 30 
202 70 


Leu CTG 
Leu CTA 
Leu CTT 
Leu CTC 


792 15 
434 8 
1273 24 
1189 22 


347 9 
281 8 
1032 28 
691 19 


445 28 
153 9 
241 IS 
498 31 


371 33 
59 5 
151 13 
434 38 


Pro CCG 
Pro CCA 
Pro CCT 
Pro CCC 


492 13 
1507 39 
1063 28 

755 20 


236 9 
1126 42 
874 32 
469 17 


256 23 
381 34 
189 17 
286 26 


224 30 
202 27 
118 15 
212 28 



n £ the number of DNA sequences in the sample. No. is the number occurrences of a 
given codon in the sample. % is the percent occurrence for each codon within a given amino 
acid in the sainple. (See text for description of the samples). 
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Table 4 

Codon usage in pooled sequences of higher plant genes. 







Soybean 


Maize 


CAB 


RuBPSSU 






n ~ 


*y 


D = 


*0 


n — 


1 / 


n — 




AmAcId 


CodOD 


No. 


19 


No. 




NO, 


tt. 


no. 






OCiO 


90 


16 


95 


16 


42 


8 


16 


9 


Gly 


GGA 


107 


3J 


to 


1 1 


1 Al 
10/ 


1*7 




<i 

ji 




GGT 


lyj 


11 
JJ 


I'M 

izy 


^1 

Zl 


iOA. 

lyo 


J/ 




1 / 




GGC 


102 


lo 


MJZ 


Ml 


1 IQ 


Li 






Gtu 


GAG 


310 


51 


368 


81 


178 


71 


139 


74 


Gtu 


GAA 


301 


49 


84 


19 


73 


29 




tit 
20 


Asp 




244 


62 


87 


24 


53 


29 


39 


33 


Asp 




150 


38 


277 


76 


128 


71 


79 


67 


Val 


GTG 


219 


37 


227 


40 


62 


21 


93 


36 


Val 


GTA 


71 


13 


36 


6 


24 


8 


7 


3 


Val 


Oil 


227 


38 


99 


17 


118 


39 


87 


33 


VUl 




75 


12 


209 


37 


96 


32 


73 


Zo 


Ala 


(jCG 


42 


8 


211 


24 


26 


5 


16 


5 


Ala 


GCA 


170 


30 


115 


13 


61 


12 


42 


14 


Ala 

/\ia 


nn* 
\%\^ 1 


208 


37 


237 


27 


225 


45 


110 


38 


A In 

Ala 




139 


25 


324 


36 


192 


38 


125 


43 


Arg 


AGG 


88 


22 


109 


26 


21 


15 


17 


12 


Afg 


A(iA 


119 


30 


28 


7 


33 


24 


31 


21 


Scr 


A(5T 


117 


18 


29 


5 


15 


5 


21 


8 


Scr 


AGO 


129 


20 


ISO 


28 


84 


27 


56 


22 


lys 


AAG 


278 


58 


367 


90 


186 


85 


176 


85 


Lys 


AAA 


204 


42 


43 


10 


34 


15 


30 


15 


Asn 


AAT 


168 


40 


56 


19 


52 


30 


35 


26 


Asn 


AAC 


248 


60 


246 


81 


119 


70 


102 


74 


Met 


ATG 


184 


too 


210 


100 


111 


100 


115 


too 


lie 


ATA 


109 


24 


35 


8 


10 


6 


1 


1 


lie 


ATT 


219 


49 


100 


24 


61 


40 


63 


43 


lie 


ATC 


118 


27 


284 


68 


83 


54 


83 


56 


Thr 


ACG 


29 


7 


114 


26 


10 


6 


5 


3 


Thr 


ACA 


128 


29 


48 


11 


35 


22 


21 


13 


Thr 


ACT 


151 


35 


72 


16 


61 


38 


59 


36 


Thr 


ACC 


124 


29 


212 


47 


54 


34 


79 


48 


Trp 


T(JG 




185" 


84 


IM 


^ 


ito 


1^ 


100 


End 


TGA 


5 


18 


7 


26 


15 


88 


2 


11 


Cys 


TGT 


63 


40 


29 


21 


16 


39 


7 


9 


Cys 


TGC 


95 


60 


110 


79 


25 


61 


72 


91 


End 


TAG 


9 


32 


14 


52 


0 


0 


1 


5 


End 


TAA 


14 


50 


6 


22 


2 


12 


16 


84 


Tyr 


TAT 


135 


49 


38 


14 


23 


19 


17 


10 


Tyr 


TAG 


139 


51 


240 


86 


99 


81 


151 


90 


Leu 


TTG 


175 


24 


116 


13 


118 


30 


79 


36 


Uu 


TTA 


79 


11 


28 


3 


15 


4 


6 


3 


Phe 


TTT 


166 


46 


69 


20 


106 


40 


32 


20 


Phc, 


TTC 


193 


54 


278 


80 


160 


60 


125 


80 


Ser 


TCO 


39 


6 


89 


16 


17 


5 


10 


4 


Scr 


TCA 


125 


19 


56 


10 


46 


15 


48 


19 


Scr 


TCT 


140 


22 


75 


14 


83 


26 


33 


13 


Scr 


TCC 


94 


15 


145 


27 


69 


22 


89 


34 


Arg 


CGG 


17 


4 


54 


13 


7 


5 




1 


Arg 


a;A 


41 


10 


13 


3 


6 


4 


3 


2 


Arg 


CGT 


70 


18 


45 


11 


50 


36 


48 


33 


Arg 


C(iC 


64 


16 


165 


40 


20 


15 


44 


31 


Gin 


CAG 


181 


41 


311 


59 


36 


37 


75 


51 


Gin 


CAA 


261 


59 


219 


41 


60 


62 


73 


49 


His 


CAT 


124 


63 


49 


29 


16 


32 


4 


18 


His 


CAC 


73 


37 


122 


71 


34 


68 


18 


82 
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Soybean 


Milze 


CAB 


RuBP SSU 


AmAcId 


Codon 


No. 


% 


No. 


% 


No. 


% 


No. % 


Uu 


CTG 


75 


10 


289 


31 


29 


1 


27 12 


Uu 


CTA 


60 


8 


78 


9 


6 


2 


9 4 


Uu 


CTT 


184 


26 


147 


16 


134 


34 


56 25 


Leu 


CTC 


148 


21 


261 


28 


88 


23 


43 20 


Pro 


CC(i 


55 


8 


149 


27 


29 


10 


13 6 


Pro 


CCA 


346 


47 


126 


23 


137 


47 


72 34 


Pro 


CCT 


236 


32 


109 


20 


73 


25 


60 29 


Pro 


CCC 


95 


13 


164 


30 


54 


18 


66 31 



n = the number of DNA sequences in ihe sample. No. is the number occurrences of a 
given codon in the sample. % is the percent occurrence for each codon within a given amino acid 
in the sample. (See text for description of the samples). 

The G ending codons for Thr, Pro, Ala and Ser are avoided in both monocots and dicots 
because they contain C in codon position IL The CG dinucleotide is strongly avoided in 
plants (23) and other eukaryotes (27), possibly due to regulation involving methylation. In 
dicots, XCG is always the least favored codon, while in monocots this is not the case. The 
doublet TA is also avoided in codon positions II and III in most eukaryotes (27), and this is 
true of both monocots and dicots. 

Grantham and colleagues (18) have developed two codon choice indices to quantify CG 
and TA doublet avoidance in codon positions II and III. XCG/XCC is the ratio of codons 
having C as base II of G-ending to C-ending triplets, while XTA/XTT is the ratio of A- 
ending to T-ending triplets with T as the second base. These indices have been calculated 
for the plant data in this paper (Table 5) and support the conclusion that monocot and dicot 
species diflFer in their use of these dinucleotides. This pattern of synonymous codon usage is 
not dependent on the inclusion of storage protein genes in the monocot sample, since the 
pooled codon usage data for monocots without storage proteins is even less like the dicot 
pattern (Table 3). Not surprisingly, the pooled plant sample resembles the dicot pattern 
more than that of monocots, since ahnost three times as many dicot sequences as monocot 
sequences were avaOable. 

For two species, soybean and maize, larger sequence samples were available (29 and 28 
genes respectively), so that species- specific codon usage profiles could be calculated (Table 
4). Not surprisingly, the maize codon usage pattern resembles that of monocots in general, 
since these sequences represent over half of the monocot sequences available. The codon 
profile of the maize subsample is even more strikingly biased in its preference for G + C in 

Tables 



Avoidance of CG and UA doublets in codons position II-III. 



Group 


Plants 


Dicots 


Monocots 


Monocots 


Maize 


Soybean 


RuBPC SSU 


CAB 










no storage 


















protein 










XCG/XCC 


40 


30 


61 


62 


67 


37 


18 


22 


XTA/XTT 


37 


35 


47 


34 


43 


41 


9 


13 



XCG/XCC and XTA/XAA values are multiplied by 100. 
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codon position III. However, the soybean codon usage pattern is almost identical to the 
general dicot pattern, even though it represents a much smaller portion of the entire dicot 
sample. 

In order to determine whether the coding strategy of highly expressed genes such as 
RuBPC SSU and CAB is more biased than that of plant genes in general, we calculated 
codon usage profiles for subsets of these genes (20 and 17 sequences respectively) (Table 4). 
The RuBPC SSU and CAB pooled samples are characterized by stronger avoidance of the 
codons XCG and XTA than in the larger monocot and dicot samples (Tables 4 and 5). 
Although most of the genes in these subsamples are dicot in origin (17/20 and 15/17), their 
codon profUe resembles that of the monocots in that G + C is preferred in the degenerate 
base III. 

The use of pooled data for highly expressed genes may obscure identification of species- 
specific patterns in codon choice. Therefore, we have tabulated the codon choices of 
individual genes for RuBPC SSU (Table 6) and CAB (Table 7). The preferred codons of 
the maize and wheat genes for RuBPC SSU and CAB are more restricted in general than 
are those of the dicot species. Matsuoka et al. (28) noted the extreme codon bias of the 
maize RUBPC SSU gene as well as two other highly expressed genes in maize leaves, CAB 
and phosphoenolpyruvate carboxylase. These genes almost completely avoid the use of 
A+T in codon position III, although this codon bias was not as pronounced in non-leaf 
proteins such as ADH, zein 22 kDa subunit, sucrose synthetase and ATP/ADP translocator. 
Since the wheat SSU and CAB genes have a similar pattern of codon preference, this may 
reflect a common monocot pattern for these highly expressed genes in leaves. The CAB 
gene for Lenma and the RuBPC SSU genes for Chlamdomonas share a similar extreme 
preference for G + C in codon position III. 

In dicot CAB genes, however. A+T degenerate bases are preferred by some synonymous 
codons (e.g. GCT for Ala, CFT for Leu, GGA and GGT for Gly). In general the G + C 
preference in position III is less pronounced for both RuBPC SSU and CAB genes in dicots 
than in monocots. 

Discussiont 

Because of the degenerate nature of the genetic code, only part of the variation 
contained in a gene is expressed in its protein. It is clear that variation between degenerate 
base frequencies is not a neutral phenomenon since systematic codon preferences have 
been reported for bacterial, yeast and mammalian genes. Analysis of a large group of plant 
gene sequences indicates that synonymous codons are used differently by monocots and 
dicots. These patterns are also distinct from those reported for E, coli, yeast and man 
(13,18). 

In general, the plant codon usage pattern more closely resembles that of man and other 
higher eukaryotes than uniceUular organisms, due to the overall preference for G+C 
content in codon position III (13,18). Monocots in this sample share the most commonly 
used codon for 13 of 18 amino acids as that reported for a sample of human genes (18), 
although dicots favor the most commonly used human codon in only 7 of 18 amino acids. 

Several earlier discussions of plant codon usage have focussed on the differences between 
codon choice in plant nuclear genes and in chloroplasts (18,23). Chloroplasts differ from the 
nuclear genome of higher plants in that they encode only 30 tRNA species (29, 30). Since 
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chloroplasts have restricted their tRNA genes, the use of preferred codons by chloroplast- 
encoded proteins appears more extreme. However, a positive correlation has been reported 
between the level of isoaccepting tRNA for a given amino acid and the frequency with 
which this codon is used in the chloroplast genome (31). 

Our analysis of the expanded plant sample confirms earlier reports that the nuclear and 
chloroplast genomes in plants have distinct coding strategies. The codon usage of monocots 
in this sample is distinct from chloroplast usage, sharing the most commonly used codon for 
only 1 of 18 amino acids, Dicots in this sample share the most commonly used codon of 
chloroplasts in only 4 of 18 amino acids. In general, the chloroplast codon profile more 
closly resembles that of unicellular organisms, with a strong bias towards the use of A+T in 
the degenerate third base. 

In unicellular organisms, highly expressed genes use a smaller subset of codons than do 
weakly expressed genes, although yeast and E. coli use distinct preferred codons in some 
cases. Sharp and Li (12) report that codon usage in 165 £. coli genes reveals a positive 
correlation between high expression and increased codon bias. Bennetzen and Hall (15) 
and others (12-14, 17, 18) have decribed a similar trend in codon selection in yeast. Codon 
usage in these highly expressed genes correlates with the abundance of isoaccepting tRNAs 
in both yeast and E. coli If, as Ikemura (16) has proposed, the good fit of abundant yeast 
and E. coli mRNA codon usage to isoacceptor tRNA abundance promotes^high translation 
levels and and high steady state levels of these proteins, then the potential for high levels of 
expression of plant genes in yeast or E. coli could be limited by their codon usage. 
Hoekema et al. (24) report that replacement of the 25 most favored yeast codons with rare 
codons in the 5' end of the highly expressed gene PGKl leads to a decrease in both mRNA 
and protein. These results indicate that codon bias should be considered when engineering 
high expression of foreign genes in yeast and other systems. 

A number of researchers have attempted to express plant genes in yeast (32-34) and £. 
coli (35-37), In the case of wheat a-gliadin (32), ot-amylase (33) genes, and maize zein genes 
(34), low levels of expression have been reported in yeast. Neill et al. (32) have suggested 
that the low levels of expression of a-gliadin in yeast may be due in part to codon usage bias, 
since o-gliadin codons for Phe, Leu, Ser, Gly, Tyr and especially Glu do not correlate well 
with the abundant yeast isoacceptor tRNAs. In £. coli, however, soybean glycinin A2 (35) 
and wheat RuBPC SSU (36, 37) are expressed adequately. 

Not much is known about the makeup of tRNA populations in plants. Viotti et al. (38) 
report that maize endosperm actively synthesizing zein, a storage protein rich in glutamine, 
leucine, and alanine, is characterized by higher levels of accepting activity for these three 
amino acids than are maize embryo tRNAs. This may indicate that the tRNA population of 
specific plant tissues may be adapted for optimum translation of highly expressed proteins 
such as zeia To our knowlege, no one has experimentally altered codon bias in highly 
expressed plant genes to determine possible effects of the protein translation in plants to 
check the effects on the level of expression. Our data indicate that the highly biased RuBPC 
SSU and CAB genes would be good candidates for such an experiment. 
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The antibiotic fusidic acid and certain closely related steroidal compounds 
are potent competitive inhibitors of the type I variant of chloramphenicol 
acetyltransferase (CATj). In the absence of crystal lographic data for CATj, 
the structural determinants of steroid binding were identified by (1) 
construction in vitro of genes encoding chimaeric enzymes containing 
segments of CATj and the related type III variant (CATjjj) and (2) 
site-directed mutagenesis of the gene encoding CATjj|, followed by kinetic 
characterisation of the substituted variants. Replacement of four residues 
of CATni (Gln92, Asnl46, Tyrl68 and Ilel72) by their equivalents from 
CAT] yields an enzyme variant that is susceptible to competitive inhibition 
by fusidate with respect to chloramphenicol (Kj = 5.4 |,iM). The structure of 
the complex of fusidate and the Q92C/N146F/Y168F/I172V variant, 
determined at 2.2 A resolution by X-ray crystallography reveals the 
inhibitor bound deep within the chloramphenicol binding site and in close 
proximity to the side-chain of Hisl95. an essential catalytic residue. The 
aromatic side-chain of Phel46 provides a critical hydrophobic surface 
which interacts with non-polar substituents of the steroid. The remaining 
three substitutions act in concert both to maintain the appropriate 
orientation of Phe 146 and via additional interactions with the bound 
inhibitor. The substitution of Gln92 by Cys eliminates a critical hydrogen 
bond interaction which constrains a surface loop (residues 137 to 142) of 
wild-type CATjjj which must move in order for fusidate to bind to the 
enzyme. Only two hydrogen bonds are observed in the CAT-fusidate 
complex, involving the 3-a-hydroxyl of the A-ring and both hydroxyl of 
Tyr25 and NE2 of His 195, both of which are also involved in hydrogen 
bonds with substrate in the CATju -chloramphenicol complex. In the acetyl 
transfer reaction catalysed by CAT, NE2, of Hisl95 serves as a general base 
in the abstraction of a proton from the 3-hydroxyl of chloramphenicol as 
the first chemical step in catalysis. The structure of the CAT-inhibitor 
complex suggests that deprotonation of the 3-a-hydroxyl of bound fusidate 
by this mechanism could produce an oxyanion nucleophile analogous to 
that seen with chloramphenicol, but one which is incorrectly positioned to 
attack the thioester carbonyl of acetyl-CoA, accounting for the observed 
failure of CAT to acetylate fusidate. 
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Structure of a CAT-Steroid Complex 



Introduction 

Steroid hormones and related steroidal molecules 
are known to bind to a range of macromolecular 
targets in eukaryotes including nuclear receptors, 
intracellular enzymes involved in steroid biosyn- 
thesis or metabolism, and sex-hormone and corti- 
costeroid binding globulins present in plasma 
(Parker, 1993). Although the structure of the 
complex between progesterone and an anti-steroid 
Fab' fragment (Arevalo eta/., 1993), and unliganded 
structures of ketosteroid isomerase (Westbrook 
et ai., 1984), uteroglobin (Morize et ai, 1987), and a 
bacterial P-hydroxysteroid dehydrogenase (Ghosh 
et ai., 1991), have been determined by X-ray 
crystallography the breadth of the spectrum of 
structural motifs for steroid binding by proteins 
remains unknown. In fact, the sole example of the 
structural determination of the complex of a 
steroidal ligand and an in vivo protein target is that 
recently determined for dehydroisoandrosterone 
bound to a bacterial cholesterol oxidase (Li et al,, 
1993), a flavin-dependent enzyme believed to have 
a role in the degradation of 3-(3-hydroxysteroids, 
permitting the bacterium to utilise cholesterol as a 
sole source of carbon and energy 

It is well documented that the type I variant 
(CATj) of the enzyme chloramphenicol acetyltrans- 
ferase (CAT; EC 2.3.1.28), a common effector of 
bacterial resistance to the antibiotic chlorampheni- 
col (Shaw, 1967), can also confer resistance to the 
steroidal antibiotic fusidic acid when expressed in 
an Escherichia coli mutant which, unusually for 
Gram-negative bacteria, is fusidate-sensitive (Ben- 
nett & Shaw, 1983). Resistance to chloramphenicol 
is achieved via the enzyme-catalysed transfer of an 
acetyl group from acetyl-CoA to the primary 
hydroxyl group of the substrate, yielding 3- 
acetylchloramphenicol which fails to bind to, and 
thus no longer inhibits, the peptidyl-transferase 



centre of prokaryotic ribosomes. Fusidic acid is also 
an inhibitor of bacterial protein synthesis but is 
known to act at the elongation phase via stabilisation 
of the complex of elongation factor G and GDP 
(reviewed by Cundliffe, 1981). As binding by CATi 
leads neither to acetylation nor to any other covalent 
modification of fusidate, it is likely that the 
resistance mechanism is simply one of sequestering 
the antibiotic from its target (Bennett & Shaw, 1983). 

Genes encoding CAT have been isolated from 
many bacterial genera and nucleotide sequences of 
some 30 different natural variants are known, 
demonstrating 25 to 98% amino acid identity in 
pairwise comparisons. None are homologous in 
primary structure to known steroid-binding pro- 
teins, nor, with the exception of the acetyltrans- 
ferase component of the pyruvate dehydrogenase 
complex which has the same tertiary fold as CAT 
(Mattevi et ai., 1992), are they known to be related 
structurally to other proteins of any type. All CAT 
variants are homotrimers (3 x 25 kDa) with three 
active sites located in the clefts at the subunit 
interfaces (Shaw & Leslie, 1991) and operate via a 
rapid-equilibrium ternary complex mechanism with 
random order of addition of substrates (Kleanthous 
& Shaw, 1984). The imidazole of His 195 acts as a 
general base to abstract a proton from the primary 
hydroxyl of chloramphenicol, followed by attack of 
the resulting oxygen nucleophile at the carbonyl of 
the acetyl-CoA thioester. The resulting tetrahedral 
oxyanion intermediate, stabilised by hydrogen- 
bonding with the hydroxyl of Serl48, then collapses 
to yield the products 3-acetylchloramphenicol and 
CoA. Such a mechanism is supported by the results 
of chemical modification experiments (Kleanthous 
ef ai., 1985) as well as X-ray crystallographic (Leslie, 
1990) and site-directed mutagenesis (Lewendon 
ef ai., 1990, 1994) studies of the type III variant of 
CAT (CATni) (Murray et ai., 1988). In contrast, 
fusidate resistance appears to be a unique property 
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Figure 1. Chemical structure of fusidic acid and chloramphenicol. 
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Figure 2. Sequence alignment of CATjjj and CATj. The primary sequences of CATjjj and CATj were optimally aligned 
and numbered as described by Shaw & Leslie (1991). Secondary structural motifs (helices and strands) observed in the 
crystal structure of CATjjj (Leslie, 1990), and the homologous regions of CATj, are underlined. The 17 residues which 
form the chloramphenicol binding site are shown in bold text. Positions of residue identity are indicated by * and those 
amino acids which are conserved in all known CAT variants by #. Double arrows above the sequence indicate the 
crossover points exploited in the construction of chimaeric CAT variants. 



of the CATj enzyme, encoded by the transposon 
Tn5and many different enterobacterial R-plasmids. 
Fusidate binding to CATj is competitive with 
respect to chloramphenicol, despite the lack of 
obvious structural equivalence between the two 
ligands (Figure 1), and extends to various fusidate 
derivatives and related steroidal molecules (Bennett 
& Shaw, 1983). 

An understanding of the determinants of binding 
of ligands which are not substrate analogues to 
proteins is a prerequisite to rational drug design 
and to the production of novel enzymes by protein 
engineering. In the case of steroid binding by CATj, 
such an objective could in principle be achieved by 
determination of the structure of the enzyme/ 
fusidate complex by X-ray crystallography. How- 
ever, a prolonged and exhaustive effort has failed to 
yield crystals of CATj with suitable diffraction 
properties. Efforts to model the binding of fusidate 
to CATj using the known structure of the complex 
of CATjjj and chloramphenicol (Leslie. 1990; M.J.S., 
unpublished experiments) have been frustrated by 
two obstacles; a lack of residue conservation 
between CATj and CATjjj (46% identity), even in 
the substrate binding sites, and the fact that a 
chloramphenicol site is located deep within each of 
the three inter-subunit clefts and could, in principle, 
be occluded (rather than occupied) by fusidate 



bound on the trimer surface. Random mutagenesis 
of the genes encoding either CATj or CATjjj was 
considered inappropriate due to the absence of a 
direct or facile selection for fusidate-sensitive 
mutants of the former, and the consideration that 
single point mutations were unlikely to confer 
fusidate resistance upon E. coli expressing CATjjj. 
However, starting with the assumption that the 
tertiary folds of CATj and CATjjj were likely to be 
essentially the same, and hence that steroid binding 
affinity is conferred primarily as a consequence of 
differences in amino acid side-chains, it seemed 
reasonable that site-directed mutagenesis of CATjjj 
could be useful in mapping the fusidate binding 
site. The desired result was achieved by the 
construction in vitro of recombinants expressing 
chimaeric CAT enzymes, followed by directed 
mutagenesis to produce variants of CATjjj which 
bind fusidic acid with high affinity and are 
amenable to crystallographic analysis. 

Results and Discussion 

Production and characterisation of CAT|/CAT||| 
chinnaeric enzymes 

Steady state kinetic analysis of the inhibition of 
wild-type CATj and CATjjj by sodium fusidate 
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indicated that the steroidal inhibitor is bound 
~ 200-fold more strongly by the former (Ki = 1.5 |,iM 
and 279|.iM, respectively). In each case binding is 
competitive with respect to the substrate chloram- 
phenicol. However, because chloramphenicol binds 
in each of three deep clefts on the enzyme surface 
of CATjji (and presumably CATj also) it cannot be 
presumed a priori that fusidate binding is deter- 
mined solely by the amino acid side-chains which 
form the substrate binding^ pocket. Calculations 
(using a sphere of radius 1.4 A) indicated that in the 
CATjjj-chloramphenicol complex (Leslie, 1990) only 
one chlorine atom and the two oxygen atoms of the 
p-nitro substituent are solvent-accessible. Thus, 
binding of fusidate to non-conserved surface 
residues flanking the entrance to the chlorampheni- 
col site (residues 26 to 30. 137 to 142 and 163 to 167; 
Figure 2) might in principle result in competitive 
inhibition by simply precluding access of substrate 
to the catalytic centre of CATj. Of the 17 amino acid 
side-chains which form the chloramphenicol bind- 
ing site of CATjjj. only nine are retained in CATj of 
which four are conserved in all CAT variants 
(Figure 2; Zhao & Aoki, 1992). As a first step in 
delineating regions of CATj which contained 
determinants of high affinity steroid binding, we 
constructed a series of recombinant CAT genes 
encoding chimaeric enzymes. 

Two crossover points were selected for construc- 
tion of CATj/CATjij chimaeras at locations which 
divide the CAT open reading frames approximately 
into thirds. The first crossover point, in a short 3io 
helix on the enzyme surface between residues 72 
and 73, was chosen because this site is known to 
tolerate insertions of additional amino acid residues 
while retaining activity (Betz & Sadler, 1981; I. A.M., 
unpublished experiments). The second was located 
between two highly conserved surface residues 
(ProlSl and Trpl52) which form part of a reverse 
turn between P-strands G and H. The fact that the 
two CAT variants are 46% identical in sequence and 
readily form heterotrimers of CATj and CATjjj 
subunits that are catalytically competent (Packman 
& Shaw, 1981; Day etaL, 1995) raised the possibility 
that some of the chimaeric enzymes might be active, 
notwithstanding the potential for folding defects 
and/or inappropriate intra- and inter-monomer 
interactions. Of six possible chimaeric proteins 
encoded by crossovers at these positions, only five 
produced CAT protein, four of which were able to 
acetylate chloramphenicol and three were charac- 
terised in respect of inhibition by sodium fusidate. 
Enzymes in which residues 6 to 71 of CATjjj 
replaced residues 1 to 71 of CATj and in which 
residues 152 to 221 of CATj replaced residues 152 to 
219 of CATjjj (the III-I-I and III-III-I chimaeras) 
were purified by standard affinity chromatography 
methods and their steady state kinetic parameters 
for acetyl transfer to chloramphenicol and inhibition 
by fusidate were determined (Table 1). The former 
is inhibited by fusidate % = 2.0 |iM) almost as 
effectively as is wild-type CATj (1.5 |iM), whereas 
the latter is —twofold less sensitive to fusidate 



Table 1. Steady state kinetic parameters determined for 
wild-type and chimaeric CAT variants 





Arm 


Km Cm 


Ki FA 


Variant 


(s-') 




(mM) 


Wild-type CAT| 


97" 


ir 


1.5 


Wild-type CATjm 


599^ 


11.6'' 


279 


III-M 


9.3 


19 


2.0 


III-III-I 


191 


65 


505 


III (Il39..«) 


534 


17 


193 



III-I-I. Chimaeric CAT variant wherein residues 1 to 7 1 of CAT[ 
were replaced by residues 6 to 71 of CAT|j|. 

III-III-l. Chimaeric CAT variant wherein residues 152 to 219 of 
CATjji werc replaced by residues 152 to 221 of CATi. 

Ill ll)39.M3), Chimaeric CAT variant wherein loop 139 to 143 
(VTPEN) of CATjjj is replaced by the equivalent loop (FIEN) of 

CATj. 

Kinetic parameters were determined as described in iVlaterials 
and Methods. Parameters are the mean of a minimum of three 
independent determinations, and standard error values are 15% 
(or less) of the quoted values. Km values for the binding of the 
second substrate, acetyl-CoA, are within threefold of the values 
of the wild-type enzymes in the cases of all CAT variants 
analysed in the present study (data not shown). 

" Data from Murray et al. {1991b). 

^*Data from Lewendon et ai (1988). 



(/C, = 505 pM) than wild-type CATjjj. A third 
chinnaera (III-I-III) wherein residues 72 to 151 of 
CATjjj are replaced by their equivalents from CATj 
could only be assayed in crude cell-free extracts 
by means of a very sensitive radiometric assay 
(Gorman et ai, 1982) but did display fusidate 
sensitivity intermediate to that of the two wild-type 
enzymes assayed under similar conditions (data not 
shown). Taken together, such data suggest that the 
principal determinants of high affinity fusidate 
binding in CATj cannot reside in the N-terminal 
third of the protein and, additionally that the 
C-terminal third of CATj is insufficient by itself to 
confer CATj-like affinity for fusidate, only binding 
the ligand in the context of residues 72 to 151 of that 
variant. It seemed likely therefore that differences 
in steroid affinity are most likely a consequence 
of substitutions of some of those residues (92, 105, 
146, 168 and 172) known to be variable in the 
chloramphenicol binding site and/or in the surface 
loops 137 to 142 and 163 to 167. However, 
replacement of residues 139 to 143 (Val-Thr-Pro- 
Glu-Asn) of CATjjj by the shorter loop (Phe-Ile-Glu- 
Asn) from CATj yields an enzyme with kinetic 
parameters that are almost identical to those of 
wild-type CATjjj (Table 1). Therefore, site-directed 
mutagenesis of each of the eight non-conserved 
residues in the chloramphenicol binding site of 
CATjjj was used in an attempt to further define 
fusidate binding residues. 

Site-directed mutagenesis of residues of the 
chloramphenicol binding site of CATm 

Eight of the 17 amino acid side-chains which form 
the chloramphenicol binding site of CATjjj are 
substituted (Xaa) in CATj; Phe24(Ala). Tyr25(Phe). 
Leu29(Ala), Gln92(Cys), Alal05(Ser). Asnl46{Phe), 
Tyrl68(Phe) and Ilel72(Val). The Y25F mutant was 
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Table 2. Steady state kinetic parameters determined for 
singly and multiply substituted CATj{| variants 







K,» Cm 


Ki FA 


Variant 


(s-') 


(mM) 


(mM) 


A. 








VVUU-iypc TM 


599a 


1 1.6" 


279 


F24A/L29A 


500 


33 


133 


Y25F 


258" 


15'* 


111 


NH6F 


690 


42 


>2500 


Y168F/1172V 


358 


38 


255 


B. 

Q92C/N146F 


243 


20 


44 


Q92C/N146F 








Y168F/II72V 


377 


20 


5.4 


F24A/L29A 








Q92C/NI46F 








Y168F/1172V 


112 


48 


3.8 


Wild-type CAT| 


97*^ 


11^ 


1.5 



^ Data from Lewendon et al. {1988). 
"Data from Murray et al. (1991a). 
Data from Murray et al. (1991b). 



prepared and analysed by steady state kinetic 
methods in an earlier study (Murray ef a7., 1991a). 
The substitutions Q92C. A105S and N146F were 
each introduced into CATijj as individual point 
mutations, whereas the F24A/L29A and Y168F/ 
1 172V double mutants were prepared using a single 
round of mutagenesis taking advantage of the close 
proximity of the codons of both pairs of residues in 
the CAT structural gene. The steady state kinetic 
parameters for chloramphenicol acetylation and 
inhibition by sodium fusidate for four such variants 
are presented in Table 2A. Data were not 
determined for the Cys92 and Serl05 enzymes 
because preliminary experiments were indicative of 
a fusidate sensitivity very similar to that of 
wild-type CATjjj in each case. With the exception of 
the N146F variant, each of the residue substitutions 
has only a minor influence on steroid binding. 
However, the substitution of Asnl46 by phenyl- 
alanine effectively abolishes competitive binding of 
fusidate by CATjij. Indeed, whereas inhibition of 
the N146F enzyme is seen at very high concen- 
trations of inhibitor {>2.5 mM) it is impossible to 
distinguish between the effect of inhibitor binding 
to CAT and probable indirect effects of the 
detergent- 1 ike fusidate molecule on the partitioning 
of chloramphenicol between enzyme and solvent 
water (data not shown). Loss of affinity for the 
inhibitor accompanying the N146F mutation does 
not appear to be a consequence of any global 
changes in the integrity of the enzyme as judged by 
the near-wild-type kinetic properties of N146F in 
the acetyl transfer reaction. Therefore, and in spite 
of the fact that the N146F substitution has an effect 
opposite to that predicted, it seemed plausible that 
Phel46 might have an important role in binding of 
the inhibitor. In the structure of the binary complex 
of CATjij and chloramphenicol, the side-chain of 
Asnl46 is the centre of an extended hydrogen bond 
network which includes the amide side-chain of 
Gln92 and the phenolic hydroxyl of Tyrl68 (Fig- 



ure 3), As each of these hydrogen bond acceptors 
and donors is substituted by more hydrophobic 
amino acids (Cys92, Phel46, Phel68) in CATj it 
seemed likely that concerted substitution of all 
three residues might be necessary to produce a 
CATj-like fusidate binding phenotype. To that end 
we constructed and purified the Q92C/N146F 
double and Q92C/N146F/Y168F/I172V quadruple 
mutants of CATm and determined their kinetic 
parameters (Table 2B). Introduction of Cys92 in the 
context of Fhel46 enhances binding >50-fold 
compared to N146F, yielding an enzyme with 
sixfold greater fusidate (Ki = 44 |iM) affinity than 
that of wild-type CATjn. When these two substi- 
tutions are combined with those of Y168F and I172V 
the fusidate inhibition constant (K) falls to 5.4 |.iM, 
a 50-fold enhancement in affinity over that of 
wild-type CATjjj and approaching that of the type 
I variant. Further modification via the F24A and 
L29A substitutions results in only a modest increase 
in binding affinity % = 3.8 |,iM) supporting the 
conclusion of the chimaeric enzyme experiments 
that residues of the N-terminal segment of CATj 
play a limited role in steroid binding. 

Crystal structure of the 

fusidate-CAT|||(Q92C/N146F/Y168F/l172V) 

complex 

The structure of the Q92C/N146F/Y168F/I172V 
variant of CATjjj was determined at 2.2 A reso- 
lution in the presence of bound fusidate (Table 3). 
With the exception of the substituted residues and the 
different bound ligands the structure is essentially 



GLN-92 

0 




PHE-135 7) \ 



ASN-146 



TYR-25 




y ILE-172 



\ <^ HISfl95 



TYR-168 



CYS-31 



Figure 3. Hydrogen bond interactions in the complex of 
wild-type CATuj and chloramphenicol. NE2 of Hisl95. a 
critical catalytic residue, acts as a general base to abstract 
the hydroxyl proton of chloramphenicol as the initial 
chemical step of the acetyl-transfer reaction. Residues 
Gln92. Asnl46 and Tyrl68, which form an extended 
hydrogen bond network on one side of the substrate 
binding site (Leslie, 1990). are each replaced by 
hydrophobic amino acid side-chains in CATj. 
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Figure 4. Fusidate bound in the chloramphenicol binding site of the Q92C/N146F/Y168F/1172V variant of CATjji. 
The solvent-accessible surface (sphere radius 1.5 A) was calculated for all amino acid side-chains which contained atoms 
positioned within 6 A of atoms of the fusidate ligand. The four rings of the steroidal ligand are bound in a roughly 
cylindrical hydrophobic cavity with the A-ring at the base and D-ring closest to the surface of the trimer. The 
hydrophobic tail of the inhibitor projects out of the binding site onto the enzyme surface. 



identical to that of wild-type CATjn in complex 
v^ith chloramphenicol (Leslie, 1990). The RMS 
differences between the structures of wild-type and 
quadruply substituted variants are 0.20 A and 
0.53 A for main-chain and side-chain atoms, 
respectively Two residues, Vall39 and Thrl40, are 
disordered in the structure and were deleted from 
the model. Such disorder reflects the loss of a 
stabilising hydrogen bond interaction between the 
side chains of Gln92 and Thrl40 (which accompa- 
nies the Q92C substitution) and is also observed in 
the complex between the quadruple mutant and 
chloramphenicol (A.G.W.L., unpublished data). In 
addition the side-chain of Hisl44 occupies two 
distinct positions due to alternate Xi angles of -t-60° 
(c.f. wild-type CATuj) and -70°. As the peptide 
bond of Hisl44 shows the highest deviation from 
planarity (12°) of any residue in wild-type CATjjj, 
the alternate conformer of this side-chain may 
reflect the loss of some conformational constraint 
following substitution of Asnl46 by phenylalanine 
in the mutant. 

The side-chains of Cys92, Phel46, Phel68 and 
Vail 72 and the atoms of the four ring systems of the 
fusidate molecule are well ordered in the CAT-in- 
hibitor complex. The steroid A-ring is positioned at 
the base of the chloramphenicol binding pocket in 
close proximity to the imidazole side-chain of 
His 195, an essential catalytic residue (Lewendon 
et ai., 1994). The B, C and D rings also occupy the 



chloramphenicol site whereas the remainder of 
the inhibitor projects onto the enzyme surface and 
is more accessible to solvent. Atoms of the 
hydrophobic ''tail", the 0-acetyl, and carboxylate 
substituents of the inhibitor (Figure 1) are 
disordered. In spite of the apparent mobility of the 
latter it may contribute to fusidate binding via 
electrostatic interaction with the guanidinium of 
Arg28, a side-chain which occupies clearly defined 
density in this structure but is disordered beyond 
CB in crystals of wild-type CATjij. 

As might be expected for a hydrophobic steroidal 
ligand, extensive non-polar interactions are ob- 
served in the CAT-fusidate complex (Figure 4). The 
C-14 methyl substituent (C-21) of the C-ring is in 
van der Waals contact (<3.6 A) with all six carbon 
atoms of the side-chain of Phel46, CD2 and CE2 of 
the same residue also interact with the C-12 
methylene group of the C-ring, supporting the 
earlier supposition that Phel46 is an important 
determinant of fusidate binding. CD2 of Leu 160 
contacts the C-4 methyl substituent (C-18) on the 
A-ring of fusidate and CD2 of Leu29 is involved in 
a similar interaction with the C-15 methylene group 
of the D-ring. In all, there are 57 contact distances 
of <4 A between atoms of fusidate and the protein 
(Table 4) and a further nine to four ordered water 
molecules in the binding site. Perhaps surprisingly 
none of the remaining three substituted residues 
(Cys92, Phel68 and VaI172) make direct contact 
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Table 3. Refinement statistics for the CAT-Fusidate complex 



Reflections {6.0 to 2.2 A) 
/?-value 


Diffraction data 

13,517 (100% complete) 
17.4% 


Atomic model 


No. of atoms 


Mean isotropic 
thermal parameter (A*^) 


Protein 

Fusidate 

Solvent** 


1686 
37 
146 


19.7 
41.0 
37.5 


Stereochemical 

refinement 

parameter 

Bond distances (A) 
Bond angles (A) 
Planar 1-4 distances (A) 
Planes (A) 
Chiral volumes (A^) 


RMS deviation 
from ideal values 

0.018 
0.049 
0.059 
0.015 
0.166 


Refinement restraint 
weighting values 

0.020 
0.040 
0.050 
0.020 
0.150 



''Includes 144 water atoms and two cobalt ions. 



with the bound inhibitor. However the side-chains 
of Cys92 and Phel68 almost certainly contribute to 
the van der Waals binding energy via interactions 
with C-26/C-27 of the hydrophobic tail and atoms 
of the C-16 (O-acetyl) substituent, which are poorly 
ordered in the crystal. It is also probable that these 
substitutions are required to permit the appropriate 
orientation of Phel46 and to provide a more 
hydrophobic environment for binding of the apolar 
ring systems. The observed abolition of fusidate 
binding when the N146F substitution is made in 
isolation most likely reflects the inappropriateness 
of introducing a hydrophobic side-chain in the 
context of the relatively hydrophiHc environment 
provided by Gln92 and Tyrl68 of CATuj. In 
wild-type CATju the amide nitrogen of Gln92 is 
involved in a hydrogen bond interaction with the 
side-chain hydroxyl of Thrl40 stabilising the 
conformation of a surface loop (residues 137 to 142) 
adjacent to the chloramphenicol binding site. 
Elimination of this H-bond, via the Q92C substi- 
tution, facilitates a major re-orientation of the loop 
which would otherwise preclude binding of the 
hydrophobic tail of fusidate (Figure 5). The 
truncation of Tyrl68 to phenyalanine is required to 
avoid a steric clash with the new position of residue 
146 and is accompanied by movement of the entire 
side-chain of FhelGB towards residue 172. This in 
turn necessitates the substitution of Ilel72 by the 
shorter valine side-chain. 

The 3-a-hydroxyl group (0-6) of the A-ring is 
involved in hydrogen bonds with both NE2 of 
Hisl95 (2.4 A) and the phenolic-OH of Tyr25 (2.8 A; 
Figure 6). Both Tyr25 and Hisl95 are also involved 
in hydrogen bond interactions with chlorampheni- 
col when it is bound to wild-type CATjjj (Leslie. 
1990). As Tyr25 is replaced by phenylalanine in 
CATj it follows that the role of His 195 is the more 
important in respect of both chloramphenicol and 
fusidate binding. Indeed, when Tyr25 of CATjjj is 
replaced by phenylalanine it results in only minor 
changes in the kinetic parameters of the acetyl 



transfer reaction (Murray et al., 1991a). The 
observation that fusidate analogues wherein the 
3-a-hydroxyl is replaced by a P-OH (3-epifusidate) 
or keto substituent (3-oxofusidate) bind weakly to 
wild-type CATj (Bennett & Shaw. 1983) serves 
to emphasise the importance of the hydrogen 
bond with Hisl95. Pre-steady state kinetic analyses 
(Day et ai., 1995) show that the rate of fusidate 
dissociation from wild-type CATj increases when 
Hisl95 is replaced by alanine but that the equivalent 
substitution in CATjjj has no effect on inhibitor 
dissociation, implying that the hydrogen bond 
interaction is absent in wild-type CATjjj. Indeed, 
one explanation of the requirement for the concerted 
substitution of four residues in the chloramphenicol 
binding site of CATjjj is that they are necessary to 
permit access of the inhibitor to the base of the 
substrate binding pocket to form the hydrogen bond 
with Hisl95. 

NE2 of Hisl95 acts as a general base to abstract 
a proton from the primary (C-3) hydroxyl of 
chloramphenicol as the initial step in the acetyl 
transfer reaction. Although the 3-a-hydroxyl of 
bound fusidate could in principle also donate its 
proton to Hisl95 it is ^1.6 A removed from the 
position occupied by the primary (C-3) hydroxyl of 
chloramphenicol and in an entirely inappropriate 
orientation to attack the thioester carbonyl of 
acetyl-CoA and form a productive tetrahedral 
intermediate after proton abstraction (Figure 7). In 
addition, the C-2. C-3 and C-4 atoms of the A-ring 
of fusidate partly overlap with the positions of 
atoms of the tetrahedral intermediate (Lewendon 
et a}., 1990). This accounts for the inability of 
fusidate binding CAT variants to acetylate the 
steroidal inhibitor. 

Using a sphere of radius 1.4 A as probe we 
calculated that 575 A^ (or 81%) of the solvent-acces- 
sible area of fusidate becomes buried on binding to 
CAT. A similar proportion (89%, 1.7 A probe radius) 
of progesterone is buried in its complex with 
the anti-progesterone monoclonal antibody DB3 
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Table 4. CAT-fusidate contacts less than 4 A 



Fusidate atom CAT atom Distance (A) 



CI 


OH Tyr25 


3.59 




CG2 Thr94 


3.96 


C2 


OH Tyr25 


3.84 




CG2 Thr94 


3.50 




CEl Phel03 


3.77 




CZ Phel03 


3.93 




0 Wat448 


3.84 


C3 


OH Tyr25 


3.82 




NE2 Hisl95 


3.49 




0 Wat436 


3.96 




0 Wat437 


3.83 


C4 


0 Wat437 


3.66 


C9 


CD2 Phel46 


3.91 


ClI 


CD2 Phel46 


3.89 


C12 


CE2 Phel35 


3.78 




CE2 Phel46 


3.14 




CD2 Phel46 


3.37 


C13 


CE2 Phel46 


3.77 


C15 


CD2 Leu29 


3.07 


C17 


CE2 Phel46 


3.89 




CZ Phel46 


3.98 


C18 


CD2 Leu 160 


3.17 




NE2 Hisl95 


3.43 




CD2 Hisl95 


3.48 




0 Wat437 


3.87 


C19 


CB Phel46 


3.89 


C20 


CEl Phe24 


3.85 




OH Tyr25 


3.47 




CEl Tyr25 


3.68 




CD2 Leu29 


3.69 


C21 


CZ Phel46 


3.08 




CE2 Phel46 


3.10 




CEl Phel46 


3.29 




CD2 Phel46 


3.38 




CDl Phel46 


3.53 




CG PheI46 


3.57 


C22 


CZ Phel46 


3.86 




CE2 PheI46 


3.93 


C23 


CE2 Phel35 


3.94 




CE2 Phel46 


3.67 




CZ Phel46 


3.90 




0 Wat314 


3.66 


C24 


0 Wat-314 


3.68 


C26 


SG Cys92 


3.88 


C27 


SO Cys92 


3.66 




OG Serl07 


3.69 


C28 


CD2 Phel35 


3.04 




CE2 Phel35 


3.50 




CG Phel35 


3.82 




0 Wat314 


3.41 


C31 


CE2 Phel68 


3.78 




CD2 Phel68 


3.99 


C32 


CE2 Phel68 


3.11 




CD2 Phel68 


3.28 


01 


OH Tyr25 


3.54 




CE2 Phel35 


3.86 




CZ Phel35 


3.92 


02 


CZ Phel46 


3.90 


06 


OH Tyr25 


2.79^ 




CZ Tyr25 


3.41 




CEl Tyr25 


3.93 




CZ Phel03 


3.59 




NE2 Hisl95 


2.42» 




CEl Hisl95 


3.15 




CD2 Hisl95 


3.33 




0 Wat436 


3.73 



^ Hydrogen bond interactions. 



(Arevalo et ai., 1993), whereas dehydroandosterone 
is completely enclosed and sequestered from bulk 
solvent when bound to cholesterol oxidase (Li et aL, 



1993). However, in the latter case it is probable that 
the eight-carbon hydrophobic tail of the substrate 
cholesterol (which is replaced by a keto group in 
dehydroandrosterone) projects out of the binding 
pocket in a manner strictly analogous to that seen 
in the CAT-fusidate complex. In each structure the 
steroid binding site is primarily formed from the 
side-chains of hydrophobic amino acid residues but 
the precise spectrum of contacts is quite different in 
each case. All three proteins utilise hydrogen bonds 
to histidyl residues to stabilise the oxygen atom 
(keto or hydroxyl) of the C-3 substituent of the 
A-ring. In cholesterol oxidase His447, which is 
hydrogen bonded to the 3-keto substituent via a 
bridging water molecule, is implicated in each of 
several postulated mechanisms for the oxidation 
reaction (Li et al. 1993). 

Tight binding of steroidal and other highly 
apolar molecules to proteins requires not only a 
high degree of surface and steric complementar- 
ity between ligand and binding site but also a 
mechanism to address the energetic cost of a 
hydrophobic binding pocket which is accessible to 
bulk solvent in the unliganded state. In the cases 
of progesterone binding to the DB3 Fab' (Arevalo 
et al., 1993), and of dehydroandrosterone to 
cholesterol oxidase (Li et ai, 1993), desolvation is 
achieved by conformational changes in the pro- 
tein which accompany ligand binding. In the 
unliganded state the binding site of DB3 exists in 
a "closed" conformation, such that its hydro- 
phobic amino acids are shielded from bulk 
solvent, which can convert to an open state to 
permit access of the progesterone. In cholesterol 
oxidase, the substrate binding site is completely 
isolated from bulk solvent both in the absence 
(when the site is occupied by ordered water 
molecules) and in the presence of the steroid. 
Thus, a model for steroid binding to cholesterol 
oxidase requires that the ligand binding cavity 
first open to permit binding then reclose around 
the substrate, with the concomitant displacement 
of the ordered water. In contrast to the above 
examples, the binding of fusidate to CAT does not 
appear to be accompanied by conformational 
changes of the protein to facilitate its access to 
the active site cleft at each subunit interface. This 
implies that CATjjj can tolerate the (presumably 
destabilising) effect of exposing three "new" 
hydrophobic residues to solvent in the unliganded 
state. In this respect it is perhaps significant that 
wild-type CATm is an extremely robust enzyme, 
being highly resistant to thermal denaturation 
(Lewendon ef ai., 1988) and remaining folded and 
trimeric in 8 M urea (P.J.D., unpublished exper- 
iments). Although the stability of the Q92C/ 
N146F/Y168F/I172V variant has not been 
studied, it should be noted that CATj, the 
chloramphenicol binding site of which is even 
more hydrophobic than that of the CATjjj 
quadruple mutant, is both less thermostable and 
less soluble than wild-type CATju (Day et aL, 
1995). 
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Figure 5. Surface loop movement accompanying the Q92C substitution. The structures of wild-type and the 
Q92C/N146F/Y168F/I172V variant of CATj[| were aligned by superposition of the main-chain atoms of residues 10 
to 130 and 150 to 210. Residues 92 and 135 to 146 of the wild-type (white) and substituted (green) variants are shown 
in addition to the fusidate molecule bound to the latter. Note that the side-chains of residues 136-137 and 142-145 have 
been deleted from the image for the purpose of clarity. Replacement of Gln92 by cysteine eliminates the hydrogen bond 
to Thrl40 permitting the movement of a surface loop (residues 137 to 142) which would otherwise preclude fusidate 
binding. Although Vail 39 and ThrHO are disordered in the CAT-fusidate complex and therefore not shown, the loop 
movement is revealed by a shift of several A in the position occupied by Prol41. 



Conclusions 

We have used site-directed mutagenesis and 
X-ray crystallography to investigate the mechanism 
by which a single enzyme active site can bind two 
competing but chemically dissimilar ligands with 
approximately equal avidity The structure of 



fusidate bound to the Q92C/N146F/Y168F/I172V 
variant of CATuj reveals how a single protein can 
confer resistance in vivo to two entirely different 
classes of antimicrobial agent and is the third 
example (after P-lactamase and CAT itself) of an 
antibiotic resistance mechanism which is under 
stood at the atomic level. It has been suggested that 




Figure 6. Intermolecular hydrogen bonds between bound ligand and the side-chains of Tyr25 and Hisl95 occur in 
both CAT-chloramphenicol and CAT-fusidate complexes. 
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Figure 7. Superposition of bound fusidate inhibitor and the structure of the complex of wild-type CATjji and the 
tetrahedral oxyanion intermediate of the acetyl-transfer reaction. In the acetyl-transfer reaction catalysed by CAT, 
abstraction (by NE2 of Hisl95) of the 3-hydroxyl (0-15) proton of chloramphenicol facilitates nucleophilic attack at the 
carbonyl carbon of acetyl-CoA leading to the formation of an oxyanion tetrahedral intermediate (0-) which is stabilised 
by a hydrogen-bond interaction with the side-chain of Serl48 (Lewendon, et a}., 1990). While, in principle, proton 
abstraction from the 3-a-hydroxyl (0-6) substituent of fusidate may also occur, the position and orientation of this 
hydroxyl group (and steric hindrance due to overlap with the C-2 and C-3 carbon atoms of the A-ring) preclude 
formation of a productive tetrahedral intermediate, accounting for the observed failure of CAT to acetylate the steroidal 
inhibitor. For the purpose of clarity only the A, B and C rings of fusidate and those parts of CoA immediately proximal 
to the tetrahedral intermediate (TI) are shown. 



co-administration of chloramphenicol and fusidate 
(or fusidate analogues) might offer a possible route 
to overcome CAT-mediated chloramphenicol resist- 
ance in clinical practice (Davies, 1 994) . Our 
structural data suggest that such a strategy may not 
be promising, since, of the diverse range of 
naturally occurring CAT variants, it is only CATj 
that carries the requisite motif of amino acid 
residues in the chloramphenicol binding site to 
generate a high affinity fusidate site. Conversely, it 
is apparent that mutations leading to the substi- 
tution of amino acids within the chloramphenicol 
binding site of a fusidate-binding CAT variant (c.f. 
CATj) might be expected to result in loss of 
inhibitor affinity without serious impairment of 
catalytic competence in the acetyl transfer reaction 
or consequent chloramphenicol resistance. It is not 
apparent whether CATj-mediated fusidate resist- 
ance is an evolved phenomenon or merely the result 
of a serendipitous arrangement of side-chains at the 
active site of this one variant. As only four of the 17 
residues of the chloramphenicol binding site are 
conserved among known CAT variants it has not 
been possible to infer the evolutionary relationships 
between members of the family Nonetheless, in the 
context of fusidate resistance, it is probably 



significant that CATj is of enterobacterial origin 
(including numerous genera that are relatively 
insensitive to fusidate due to limited outer 
membrane permeability) whereas naturally occur- 
ring CAT variants isolated from Gram-positive 
genera (commonly fusidate-sensitive) do not, to the 
best of our knowledge, confer resistance. 

The CAT-fusidate complex is the third example of 
the determination of the structure of a protein 
bound to a steroidal ligand and one wherein the 
architecture of the binding site and the likely 
mechanism of ligand binding are both quite distinct 
from those observed in previously published 
structures. Because the Q92C/N146F/Y168F/I172V 
variant of CATm is only slightly compromised in 
the acetyl-transfer reaction (Table 2B) we believe 
that the active site of the protein is not significantly 
deformed in the absence of bound fusidate. This 
implies that the hydrophobic residues of the 
fusidate binding site are solvent-exposed in the 
unliganded enzyme and that, in contrast to 
cholesterol oxidase and the anti-progesterone Fab' 
DBS, significant conformational changes are not 
required to facilitate steroid binding. It is not known 
whether the determinants and dynamics of ligand 
binding to the effector recognition domains of 
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steroid receptors are analogous to those exemplified 
by the structures of the CAT-fusidate complex 
(binding at a preformed cleft), to that of DB3 and 
progesterone (opening of a closed binding site), or to 
the cholesterol oxidase-dehydroandrosterone com- 
plex (gating and reclosure of a preformed cavity). 
However, it is probable that the several discrete but 
linked functions of receptors of the steroid/vita- 
minD/thyroid hormone superfamily (Evans, 1988) 
will involve additional structural responses to ligand 
binding, favouring dimerisation and modulating the 
specificity and affinity of receptor interaction with 
DNA response elements. 

Materials and Methods 

Construction of recombinant genes and 
site-directed mutagenesis 

Regions of the DNAs encoding CATj (Alton & Vapnek, 
1979) and CATju (Murray et a/., 1988) were recombined 
in vitro using the technique of "sticky-feet" directed 
mutagenesis (Clackson & Winter, 1989). Single and 
multiple point mutations of C ATjjj were introduced using 
oligonucleotide primers and single-stranded M13 DNA 
templates loaded with deoxyuridine by preparation in the 
dut ung E. coli strain RZ1032 (Kunkel ct ai., 1987), The 
presence of desired mutations and the absence of second 
site changes were confirmed by DNA sequence determi- 
nation of both the complete coding and the 5' non-coding 
regions of the genes. 

Expression and purification of CAT 

Wild-type and mutant CAT proteins were expressed in 
E. coli JMlOl after subcloning the coding sequences in 
pUC18. Enzymes were purified from cell-free extracts by 
affinity chromatography using chloramphenicol-Sepha- 
rose (Lewendon et a/., 1988) or by ion-exchange 
chromatography (DEAE-Sephacel) followed by affinity 
chromatography using Cibacron-blue Sepharose (Murray 
et a}., 1991a). Homogeneity of purified enzymes was 
confirmed by SDS-polyacrylamide gel electrophoresis and 
enzyme concentrations were determined by the method 
of Lowry et ai, (1951). 

Assay of CAT activity and kinetic determinations 

CAT activity was assayed spectrophotometrically at 
25°C as described previously (Lewendon et ai., 1990). 
Standard assays contained 0.4 mM acetyl-CoA (prepared 
from the lithium salt of CoA, Pharmacia, by the method 
of Simon & Shemin, 1953). 0.1 mM chloramphenicol and 
1.0 mM 5,5'-dithiobis(2-nitrobenzoic acid) in TSE buffer 
(50 mM Tris-HCl (pH 7.5). 100 mM NaCl. 0.1 mM EDTA). 
One unit is defined as the amount of enzyme required to 
convert 1 |.imol of chloramphenicol to 3-acetylchloram- 
phenicol in one minute using the standard assay. 
Concentrations of acetyl-CoA and chloramphenicol were 
varied in the standard assay for determination of steady 
state kinetic parameters and all assays were carried out 
in triplicate. Initial rate values were used to construct 
double reciprocal plots and kinetic parameters were 
derived from slope and intercept replots (Kleanthous & 
Shaw, 1984). K\ values for competitive inhibition by 
sodium fusidate were determined by varying the 



concentrations of chloramphenicol and inhibitor in 
standard assays containing a fixed concentration of 
acetyl-CoA (routinely -SxKm). and were calculated 
from linear slope replots derived from double reciprocal 
plots. Crude extracts of chimaeric enzymes which 
appeared to be inactive in the standard assay were 
screened for low levels of CAT activity using a sensitive 
radiometric assay (Gorman et al., 1982). 

Crystallisation and structure determination 

Single crystals of the Q92C/N146F/Y168F/1172V 
variant of CATju were prepared by microdlalysis using 
small "Lucite" buttons as described previously 
(Leslie, 1990). Each button contained 25 |.il of protein 
(-5.7mgml-') in 10 mM MES buffer (pH 6.3), or the 
same buffer supplemented with 0.5 mM sodium fusidate. 
and were dialysed at 4X against 10 ml of 10 mM MES 
(pH 6.3) containing 2 to 4% (v/v) 2-methyl-2,4-pentane- 
diol. 1 mM hexamine cobalt (III) chloride. 0.1 mM 
dithiothreitol and either 1 mM chloramphenicol or 
0.5 mM sodium fusidate. Crystals were isomorphous 
with those of the wild-type enzyme; space group 
R32 (equivalent hexagonal cell dimensions a= 107.8 A, 
c= 124.1 A). 80"* of data were collected to 2.2 A resolu- 
tion from a single crystal (dimensions 260 ^m x 
240 ^im X 120 ^m) grown in the presence of fusidate using 
CuKa radiation from a GX13 rotating anode generator 
with double mirror collimation. Data were recorded on a 
prototype Hendrix-Lentfer image plate scanner with a 
diameter of 18 cm. The images were integrated with 
MOSFLM (Leslie, 1992) and programs from the CCP4 
Suite (1994). A total of 68,511 observations were reduced 
to a unique dataset of 14,238 reflections with a 
crystallographic merging /?-factor of 9.5% (24.9% in the 
highest resolution range). The dataset is 99% complete out 
to 2.2 A resolution, with an overall l/o(f) ratio of 18.3 (7.6 
at 2.2 A resolution). 

The refined structure of the CATj||-chloramphenicol 
binary complex (Leslie, 1990) was used as a starting 
model for refinement. The chloramphenicol and all 
water molecules in the chloramphenicol binding pocket 
were removed from the model and the substitutions 
Q92C. N146F. Y168F and 1172V made. This model was 
subjected to alternating rounds of refinement using the 
CCP4 programs SFALL. PROTIN and PROLSQ and 
manual rebuilding using the interactive graphics program 
O Qorxes et al., 1991). After one round of refinement 
there was very clear density compatible with a model of 
fusidate derived from the crystal structure of fusidic acid 
methyl ester p-bromobenzoate (Cooper & Hodgkin, 1968) 
and this was included in the model for all subsequent 
refinement. After four rounds of model building and 
refinement the final /?-factor was 17.4% for all reflections 
in the resolution range 6.0 to 2.2 A. Computer graphics 
images were produced using the conic option (Huang 
eta}., 1991) of the MidasPlus program (Ferrin et al., 1988). 
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ABSTRACT 

The Plasmodium falciparum malaria parasite is the 
causative agent of malaria tropica. Merozoites, one of 
the extracellular developmental stages of this parasite, 
expose at their surface the merozoite surface protein-1 
complex (MSP-1), which results from the proteolytic 
processing of a 190-200 kDa precursor. MSP-1 is 
highly immunogenic in humans and numerous studies 
suggest that this protein is an effective target for a 
protective immune response. Although its function is 
unknown, there are indications that it may play a role 
during invasion of erythrocytes by merozoites. The 
parasite-derived msp-l gene, which is -5000 bp long, 
contains 74% AT. This high AT content has prevented 
stable cloning of the full-size gene in Escherichia coli 
and consequently its expression in heterologous 
systems. Here, we describe the synthesis of a 4917 bp 
gene encoding MSP-1 from the FCB-1 strain of 
Rfalciparum adjusted for human codon preferences. 
The synthetic msp-1 gene (55% AT) was cloned, 
maintained and expressed in its entirety in Exoli as 
well as in CHO and HeLa cells. The purified protein is 
soluble and appears to possess native conformation 
because it reacts with a panel of mAbs specific for 
conformational epitopes. The strategy we used for 
synthesizing the fulMength msp-f gene was to 
assemble it from DNA fragments encoding all of the 
major proteolytic fragments non^ally generated at the 
parasite's surface. Thus, after subcloning we also 
obtained each of these MSP-1 processing products as 
hexahistidine fusion proteins in E.coli and isolated 
them by affinity chromatography on Ni^* agarose. The 
availability of defined preparations of MSP-1 and its 
major processing products open up new possibilities 
for in-depth studies at the structural and functional level 
of this important protein, including the expk)ration of 
MSP-1-based experimental vaccines. 



DDBJ/EMBL/GenBank accession no. AJ131294 



INTRODUCTION 

Malaria caused by Plasmodium falciparum infections continues 
to be a serious health problem in major parts of the world. The 
identification of targets for imitiunologic interventions against 
this infectious disease remains, therefore, an important goal. 
Merozoites, which are the extracellular developmental form of 
the parasite that invade erythrocytes, expose on their surface a 
protein complex, which is the processing pnxhict of a 1 90-200 kDa 
precursor loiown as merozoite surface protein- 1 (MSP-1; 1-5). 
Upon deposition in the merozoite membrane via a glycosyl- 
phoqphatidylinositol (GPI) anchor, this precursor is proteolytically 
cleaved during late schizogony into fragments which remain 
associated with the parasite surfece. Sequence analysis of msp-l 
genes of different Pfalciparum strains has revealed that major 
regions of the protein are dimorphic belonging to either the Kl or 
the MAD20 prototype (Fig. lA) while other parts are highly 
conserved (3). Two small regions near the N-terminus show higher 
variability. These features, as well as the presence of point mutations 
scattered throughout the molecule and evidence for intragenic 
recombinaticm and/or gene conversion, confer a surprisingly limited 
polymorphism to these abundant surface proteins (3,5). 

A number of experimental findings suggest that MSP-1 of 
Pfalciparum may elicit a protective immune response against 
iiifectiOTS by the parasite. For example, in the rodent model, 
immunization of mice with the analogous protein of Plasmodium 
yoelii yielded protection (6), as did the transfer of monoclonal 
antibodies and immune serum against this pnotein (7-9). Sero- 
epidemiologic data (10,11) and results from several vaccination 
trials conducted with various Pfalciparum-dsdvod MSP-1 prep- 
arations in non-human primates also support the candidacy of this 
protein or parts thereof as promising OMi^nents of a subunit 
vaccine against malaria tropica. In these trials, Saimiri or Aotus 
monkeys were immunized either with MSP-1 isolated from 
parasites ( 1 2- 1 5) or with synthetic peptides or recombinant MSP-1 
fragments (15-21). The recombinant fragments assessed most 
recently were primarily from the C-terminal region of MSP-1. 
Although the data fit>m these trials support MSP-1 as a vaccine 
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candidate, the MSP-1 protective effects measured in these trials 
barely satisfy requirements for statistical significance because group 
sizes were generally too small and since collected data were not 
confirmed in strictly comparable, subsequent trials. The two main 
reasons for this situation are the scarceness of suitable experimental 
animals and the difficulties associated with preparing sufficient 
amounts of well-defined MSP-1 from parasites. On the other hand, 
expression of fiill-length recombinant MSP-1 in heterologous 
systems has turned out to be most difficult if not impossible (22,23). 
This q>peared primarily due to the high AT content of Rfalciparum 
DNA which prevents the cloning and stable maintenance of large 
genes in Escherichia coli, thereby precluding crucial studies at the 
genetic and biochemical level which may have led to the ehicidation 
of its fimction. 

Thus, we decided to synthesize a 4917 bp polynucleotide 
encoding MSP-1 of the Colombia FCB-1 strain (24,25) and change 
the AT content such that it could be maintained and expressed in a 
variety of hosts. Hoiein, we describe the design and syntiiesis of this 
gene based on human codon frequencies. We also report the cloning 
of this synthetic gene and its controlled ejqjression in E,coli and in 
mammalian cells. Purification of fiiU-length protein from both 
expression systems yielded material tfiat is recognized by several 
monoclonal antibodies known to interact with confonnational 
epitopes. Moreover, the strategy employed few the synthesis of the 
gene allowed fot subcloning, production and purification of all 
major processing products of MSP-1. The. a\^ability of large 
amounts of MSP-1 and its fragments opens up new possibilities for 
the thorough investigation of this prominent malaria antigen. 

MATERIALS AND METHODS 

Sequence design 

The amino acid sequence of MSP-1 of the Colombia FCB-l strain 
(25) was translated into a DKA sequence with an average codcm 
composition similar to that found in human coding sequences (26). 

This was achieved by using a random number generator to 
make each codon assignment, a process that proved to be truly 
random, because each run of the program yielded a different 
synthetic allele of MSP- 1 . One sequence was chosen as the master 
sequence and modified in a number of ways to eliminate 
sequences that might be detrimental to efficient transcription and 
translation of the synthetic gene. All analysis programs mentioned 
below were from the Genetics Computer Group program 
collection (27). Positions where the introduction of additional 
endonuclease cleavage sites appeared feasible without changing the 
amino acid sequence were identified with the *Map* program using 
the optiOT 'silent*. Tind Patterns* was used to search for consensus 
sequences that are indicative, of prokaryotic promoters, poly(A) 
sig^ials and exon-intron boundaries. Prokaryotic &ctor-indq)endent 
RNA polymerase terminator structures were identified with Ae 
Terminator* program. Inverted repeats which might lead to the 
fctfmation of undesirable secondary structures were identified with 
the *StemIoop* program. All these stmctures, when aicountered, 
were eliminated by xising alternative codons. Mcwneover, long rims of 
purines (>7 nt) that may cause transcriptional termination in some 
viral systems were disnqDted. Finally, tiie stability of the resulting 
RNA molecule was assessed with Told RNA*. This analysis was 
performed on overlying fragments because the software restricted 
the length of the iiqnit sequence to 1200 bases. Any stiuctures more 
stable than the mRNA of the human glyceraldehyde-3-phosphate 
dehydrogenase were eliminated 



Design and synthesis of oligonucleotides 

The oligonucleotide primers used for the PCR-based synthesis of 
high molecular weight, double-stranded DNA were restricted in 
size to maximally 1 20 nt in order to ensure that at least 50% of the 
PCR products were error free. The primers were designed using 
the Oligo 4.0 program and the following parameters were con- 
sidered. The overiapping region between two oligonucleotides used 
in one PCR reaction was on average 18 nt long. The internal 
stability of each overiapping oligo pair was -AG > 9 kcal/mol. 
Potential hairpin and duplex structures with -AG ^ 8 kcal/mol 
were eliminated as were false priming sites. The oligonucleotides 
were synthesized on a 1000 A pore size glass support using an 
Applied Biosystems 394 synthesizer following standard protocols. 
They were purified by electrophoresis in 10% polyacrylamide 
(PAA) containing 7 M urea. Upon electroelution from gels, they 
were precipitated with ethanol. 



Asymmetric PCR-based synthesis of double-stranded DNA 
fragments 

The overall strategy of fragment synthesis is outlined in Figure 2. 
In general, four assays were performed in parallel. The ratio of the 
oligonucleotide pair used in each reaction was 5: 1 in order to yield 
the appropriate asynunetrically amplified product (28). Products 
A, B, C and D, respectively, resulted from four PCR assays that 
contained the oligonucleotide pairs 01/02 and 05/06 in ratios of 
25:5, 03/04 and 07/08 in ratios of 5:25 pmol, respectively. Five 
PCR cycles were performed in 50 ^ll of 1 0 mM Tris-HCl, 1 .5 mM 
MgCl2, pH 8.3, containing 2.5 U Taq polymerase (Boehringer, 
Mannheim) and the four deoxynucleotide triphosphates (200 ^iM 
each). The optimized cycling conditions (Omnigene TR3 CM220 
thermocycler) were 10 s at 94° C (denaturation), 30 s at 55** C 
(annealing) and 60 s at 72° C (polymerization). Product E was 
prepared by combining the products from assays A and B, and F 
from assays C and D. After amplifying for five cycles, we added 
25 pmol of 01 and 08 to 5 pmol of E and F, respectively, to 
re-establish asymmetric oligonucleotide compositions and amplified 
for an additional eight cycles. Product G was prepared by 
combining these latter assays and amplifying for another 
1 2 cycles. PAGE was used to follow the various synthetic steps. 
The final product G was separated from other reaction products 
by electrophoresis in 1% agarose gels, eluted from the gel slice 
according to the Qiaex 11 procedure (Qiagen, Diisseldorf). 
digested with Barn^ and Cla\ and ligated into an appropriately 
cleaved pBSK* plasmid. This plasmid is identical with pBSK 
(Stratagene, Heidelberg) except that the Xba\ and Spe\ sites 
within the multiple cloning site were replaced with NheX, Mlul, 
Ncol and Styl sites. The resulting vectors were transferred into 
E.coli strain SGI 3009. DNA fragments of the expected size 
liberated fccm plasmids of bacterial clones after BamHl/Clal 
cleavage were further analysed. Usually. 10-20 clones containing 
such inserts were sufficient to identify either error-free 
600-800 bp DNA fragments or fragments containing small 
numbers of errors that could be eliminated by combining the 
error-free portions of two fragments via an intemal cleavage site. 
All fragments were finally combined via their compatible unique 
cleavage sites positioned at either end (Fig. IB). After assembly 
was complete, the msp-J^^ gene was sequenced in its entirety with 
a standard set of sequencing primers. 
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Synthesis of fuD size MSP-1 and of MSP-1 fragments in Exoti 

The DNA encoding MSP-l^^, p83 (without signal peptide. Fig. 1 B), 
p30, p38, p42 and pi 9 (the latter without GPI anchor signal. 
Fig. IB), respectively, were transferred from pBSK* vectors to 
the expression vector pDS56 (29) via their 5amHI and Cla\ 
cleavage sites. By utilizing the BamY{\ insertion site of this vector, 
six histidines were attached to the N-terminus of the respective 
protein. This allows the purification of the resulting protein via 
Ni2+ chelate chromatography (29). 

The various vector constructs were transferred into the Lac 
repressor providing E.coli SGI 3 009. Cultures derived from 
individual clones containing the proper plasmids were grown to 
early log phase (ODeoo 0.2) and induced with IPTG (1 mM) for 
3 h. With the exception of p38, all fragments as well as the 
full-size pi 90 were produced in high yields (between 2 and 10% 
of the total protein), but even p3 8 was readily purified in sufficient 
quantity to allow characterization. For isolation of the various 
expression products, cell pellets were dissolved in 6 M guanidinium 
hydrochloride and applied directly to Ni^"^ chelate columns. 
Before the adsorbed material was eluted, the colunm was 
developed with a reverse gradient running from 6 to 1 M urea in 
0.5 M Naa, 0.05 M Tris-HCl, 20% glycerol, pH 7.4. Elution 
with an imidazole gradient (0-500 mM imidazole hydrochloride 
in 0.05 M Tris-HCl, 10% glycerol, pH 7) yielded MSP-IS2 as 
well as all the fragments in highly purified and soluble form. 

Expression of MSF-l-encoding DNA in mammalian cells 

The msp-J^^ gene was inserted as a Mlu\-Cla\ fi^gment into 
plasmid pBi-5 (30) where it is co-regulated with the luciferase 
gene by the bidirectional promoter Pbi-i- Hie activity of Pbi-i is 
entirely dependent on tTA, the tetracycline omtrolled transcripticmal 
activator (3 1 ). The resulting plasmid pBi-5/MSP-l was used to 
transiently express the gene in HeLa and CHO cells that 
constitutively produce tTA. Thus, HtTA- 1 (3 1 ) and CtTA- 1 cells 
were co-transfected with pBi-5/MSP-lSl and pUHD16-l following 
a nHxiified (30) calcium phosphate method. The latter plasmid gives 
rise to p-galactosidase which serves as a standard for determining 
transfecticm efficiencies. MSP-1 expression was induced by 
removal of doxycycline (Dox) from tfie culture and cells were 
harvested after 30 h to detemiine luciferase activities in cell extracts 
as described previously (31). The production of MSP-1 was 
visualized by western blot anatysis using monocbnal antibody mAb 
5.2 (32) and conq>ared with lysates prepared from uninduced cells. 

HtTA-1 and CtTA-1 cell lines that control the synthesis of 
MSP-lSi 

To integrate the msp-I gene controlled by Pbi-i into the genome 
of HtTA-1 cells, the cells were grown in 35 mm dishes to 40-50% 
confluency and co-transfected with 2.9 ^g of linearized plasmid 
pBi-5/MSP-lS' and 0. 1 \ig of linearized plasmid pHMR272 (30), 
which confers resistance to hygromycin B. After 24 h, the cells 
were transferred to 10 cm dishes and maintained in medium 
containing 300 fig/ml hygromydn B. Resistant clones were 
isolated, expanded separately and analysed for luciferase activity 
(31) in the absence and presence of Dox (100 ng/ml). Several 
clones which exhibited efficient regulation of luciferase activity 
in a Dox-restricted manner were then analysed for tTA-dependent 
production of MSP-1 by western blot analysis. Further subcloning 
produced the cell line HtTA-9319 which efficiently expressed 



MSP-1 as well as the luc gene. The MSP-1 -expressing CHO cell 
line CtTA-27/29 was generated in an analogous way. 

Purification of MSP-1 from HeLa ceUs by immunoaffinity 
chromatography 

HtTA-93/9 cells were grown to confluency in 10 cm dishes 
containing EMEM medium supplemented with 10% PCS. Cells 
from 20 such cultures were washed twice with PBS and 
suspended in 4 ml TNET lysis buffer (150 mM NaCI, 5 mM 
EDTA, 50 mM Tris^HCl, 1% Triton X-100, pH 7.4) containing 
a cocktail of protease inhibitors (PMSF, aprotinin, antipain, 
bestatin, pepstatin and leupeptin, each at 5 |i^ml, and 50 p.g/ml 
TLCK). The suspension was kept in ice for 30 min before it was 
centrifiiged at 300 000 ^ and 4 ** C for 30 min. The supernatant was 
'cleared* by passing it through a 1.5 ml colunm packed with 
protein A-Sepharose 4 fast flow (Pharmacia Biotech) and the 
flow-through was collected. For inununoaffinity chromatography 
the flow-through was then applied to a 1.5 ml mAb 5.2 (ATCC, 
HB9148>^rotein A-Sepharose 4 fest flow column (33) equilibrated 
with TNET bufTer. The column was washed with 5 bed vol of 
TNET, pH 7.4, followed by 5 bed vol of TNET, 0.65 M NaCl, 
pH 8.0, and by 2 bed vol of 1 50 mM NaCl, 5 mM EDTA, 50 mM 
Tris-HCl, pH 6.8. Adsorbed protein was eluted with 0.1 M 
glycine buffer, pH 2.5, and fractions were collected into a 
previously titrated volume of 1 M Tris-HCl, pH 8.0. The protein 
was stored in 20% glycerol at -20'' C. 

Analysis of MSP-1 isolated from E.coH and CHO cells by 
western blot 

MSP-1^ purified from E.coli via Ni^"*" chelate chromatography 
and MSP- 1^1 isolated from CHTA-27/29 cells by immunoaffinity 
chromatography was subjected to electrophoresis in 8% PAA in 
the presence of SDS (2%) but iinder non-reducing conditions. 
Transfer of the protein onto ImmobilonP membranes (Millipore) 
was carried out in transfer buffer (0.01% SDS, 25 mM Tris, 
192 mM glycine, 20% methanol) for 90 min at 350 mA. 

Approximately 0.1 [ig of protein was applied per lane. The 
membrane-bound proteins were exposed to the various monoclonal 
antibodies and visualized via anti-mouse IgG AP-conjugate 
(Sigma, A2179) following standard procedures (33). 

RESULTS 

Design of a pofydeoxyribonucleotide encoding an MSP-1 
sequence 

The MSP-1 coding sequence chosen for redesign is from 
P.falcipanm strain FCB- 1 . Table 1 shows the bias towards A and T 
in the codons of the parasite gene in comparison with codon 
frequencies found in human coding sequences. By back-translating 
the amino acid sequence of the msp-1 gene into DNA with human 
codon frequencies, the AT content was reduced from 74 to 55%. The 
redesigned gene was fiirther modified to exclude sequences that may 
be problematic during synthesis or cloning and expression of the 
polynucleotide in various heterologous systems (Materials and 
Methods). These included a perfect E.coli promoter sequence lying 
upstream of a consensus-type translational start signal which gave 
rise to efficient expression firom an internal start site. By making use 
of the degeneracy of the genetic code, we eliminated all the 
potentially problematic sequences by changing individual base pairs 
without affecting the encoded amino acid sequence. 
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Table 1. Comparing the codon frequencies of the msp-1 gene of P.falciparum with human coding 
frequencies reveals an extreme bias towards A/T-containing codons in the parasite DNA 



Codon frequencies (%) 



Amino Codon 
aid 



MSP-1 
native 



MSP- 1 human 
synthetic coding seq. 



Amino Codon MSP- 1 
acid native 



MSP- 1 human 
synthetic coding scq. 



Ah 



AiB 



GCA 
GCC 
GCO 
OCT 

AGA 
AC50 
CGA 
CGC 
OjG 
CGT 



53 
8 
2 
37 

62 
10 
10 
0 
0 
19 



22 
49 
0 
29 

30 

17 

9 

26 

9 

9 



23 
40 
10 
27 

20 
20 
11 
19 
21 
9 



Leu 



Lys 



Met 



CTA 
CTC 
CTG 
CTT 
TTA 
TTG 

AAA 
AAG 



2 
3 
0 
17 
69 
9 

86 
14 



ATG 100 



2 

30 

51 

8 

1 

8 

35 
65 

100 



7 

20 

41 

13 

7 

13 

42 
58 

100 



Asa 



AAC 
AAT 



22 
78 



58 
42 



54 
46 



TTC 
TIT 



33 
67 



79 
21 



55 
45 



Asp 



Cys 



Gbi 



GAC 
GAT 



TGC 
TOT 



CAA 
CAG 



GAA 
GAG 



13 

87 



10 
90 



94 
6 



94 
6 



64 
36 



60 
40 



36 
64 



38 
62 



53 
47 



55 
45 



26 
74 



42 
58 



Pro 



Ser 



CCA 
CCC 

ceo 

CCT 

AGC 
ACT 
TCA 
TCC 
TOG 
TCT 



60 
10 
2 

29 

5 
24 
42 

6 
2 
21 



33 
27 
6 
35 

27 

7 

7 

24 

2 

33 



28 
32 
11 
28 

24 
15 
15 
22 
6 
18 



Gly 



CGA 
GGC 
OGG 
OCT 



48 
7 
0 
46 



25 
43 
15 
18 



25 
34 
24 
17 



Thr 



ACA 
ACC 
ACQ 
ACT 



57 
6 
2 
35 



28 
41 



22 



28 
36 
12 
24 



His 



CAC 
CAT 



26 
74 



ATA 41 
ATC 6 
ATT 53 



57 
43 

12 
55 
34 



59 
41 

15 
49 
36 



Trp 

Tyr 



TGG 

TAC 
TAT 

OTA 
GTC 
GTG 
GTT 



19 
81 

45 
5 
4 

46 



54 
46 

6 
33 
48 
13 



56 
44 

12 
24 
46 
18 



The synthetic MSP- 1 sequence was adjusted to the human codon frequencies using a random number generator. 
Multiple additional adjustments, e.g. for generating imique cleavage sites, eliminating splice donor and acceptor 
signals, etc., were made thereafter. The codon Aequencies shown for dte synthetic gene lepresent the final sequence 
synthesized which, when compared with the native gene, contains base pair changes at 1317 positions. 



Searching for hidden recognition sequences for restriction 
endonucleases permitted lis, again by incorporating single base 
pair exchanges, to position unique cleavage sites at or near the 
major processing sites where the MSP-1 precursor of FCB-1 is 
proteolytically cleaved during schizogony (Fig. lA). Thus, the 
endonucleases Sph\, BsiEW and EcoAllJi cleave the synthetic 
MSP-1 coding sequence within one, four and one amino acid, 
respectively, of the processing sites which separate pi 9 from p29, 
p29 from p38 and p38 from p30 (Fig. 1 A). Moreover, the Am/il 
site at position 2025 is within 28 amino acids of the putative 
processing site that separates p83 from p30 (34). Finally, unique 
cleavage sites were placed at either end of the full-size gene: a 
Mlu\ site at the 5'-end and a Notl as well as a Cla\ site at the 3'-end 
of the gene (Fig. IB). 



Because MSP- 1 ^ (Fig. 1 B and Q lacks the signal peptide and the 
anchoring signal, it should be synthesized and remain in the 
cytoplasm. We, tfierefore, prepared two finther genes with modifi- 
cations to the 5 - and 3'-ends. In one (msp-J^ the original signal 
peptide at the 5'-end and tfie GPI anchor signal at the 3'-end flank 
the coding sequence for the mature protein. In the other ipisp-}^\ 
the signal sequence, but not the anchor sequence, which should 
permit secretion of tfie protein but not membrane retention, flank the 
coding sequence for the mature protein. While modifications at the 
5'-end of die gene are facilitated by a unique Hin^Sl site at position 
116, those at the 3'-end are best achieved by synthesizing variants 
of pl9 which can be introduced via the unique Sph\ site. 

The polynucleotide encoding MSP-1, complete with a signal 
peptide and GPI anchor signal, comprises 4917 bp whereas the 
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Amino 
acids 



SP 



-MSP-1- 



1639 
1 



rv p83 



X p30 



Xmril 



I paa ^ p29 i pi9 I 

^ mkmh^md ii 



SpM 



pairs 



2025 
— I 



2724 3788 4575 4917 
— H 1064 1 787 1-3*2 1 



B 



83 



19 



+ U 
Seal 



30 29 

42 t 

38 ^ 



msp-l'l 







msp-l* 






0 Sequences at the Termiiti 

M-termlnus C4ermintt8 

MSP-lS 

DNA GC^fiSCQIATGAAAATC AGCTTCAT CTAATAGATCGATQ GO 

Protein Mtu I Met Lye Da Ser Pha lie stop stop Cla I 

1 2 3 16371638 1639 

DNA GCACfflKilATGAAAATC AGCTCTAA TTAATAQGCGGCCGCATCGATC GC 

Protein Mlu I Met Lys Ue Ser Ser Asnstop stop Not I Cta I 

1 2 3 16191620 1621 

MSP-1« 

DNA GffifiAICffTATGGTGACC AGCTCTAA TTAATAGGCGGCCGCATCGAT GGC 

Protein BamHI Me! Val Thr Ser Ser Asn stop stop Not I Cla I 

20 21 1619 1620 1621 



Figure 1. Schemaric outline of the primary structure of MSP- 1 of Rfakiparum (FCB- 1) and strategy for the synthesis of its coding sequence. (A) The protein comprises 
1 639 amino adds including the signal peptide (SP) and the signal for GPI aivcboring (GA). Conserved regions arc depicted in white, dirooiphic regions in grey. The 
two blocks showing the highest variability are hatched. Upper arrows delineate the major processing products corresponding to p83-pl 9 as well as, according to the 
nomenclature of Staifonl et al. (34), SP and GA. The lower arrows indicate unique cleavage sites in the synthetic gene. They pemiit subdivision of the gene into 
segments encoding the individual processing products. The processing site between p83 and p30 has not been defmed experimentally. The sizes of the sequences in 
1^ encodmg the various processing products of MSP- 1 are depicted below. They allow calculation of molecular weights which for various reasons can significantly 
deviate from those derived previously by electrophoretic mobility of the re^)cctivc proteins (2). (B) Flow chart of the gene synthesis. Five DNA fragments (83-19) 
were synthesized vAuch encode the major processing products. They overlap their adjacent fragment by an average of 1 8 bp, which allows the fusion of neighbouring 
fragments via common unique endonuclcasc cleavage sites as indicated. Fragment 83 encoding the signal peptide contains at its 5'-<nd an Mlu\ cleavage site (>). The 
5'-cnd of all other fragments contains a BamlU site (O). The 3'-<nd of pi 9 not encoding GA is followed by a Notl (O) and a C/al (•) site whereas the vereion of pl9 
encoding GA contains onfy a Cla] site. Tlie firagments were fused stepwise as indicated to yield msp-l^K The GA signal of the parasite was introduced by an 
appropriately modified fragmehi 19 to yield msp-J^^ whereas the SP sequence was eliminated ^om msp-I^^ by inserting the appropriate oligonucleotide between the 
Mul and a unique HindSl] site at position 116. The resulting msp-I^^ gene can be inserted into expression vectors via its unique BamHI and Chi (or Notl) cleavage 
sites. (C) N- and C-terminal sequences of MSP-1^ MSP-l^* and MSP-l^^ at the nucleotide and amino acid levels. The numbering of the amino acid positions is 
according to Heidrich et al. (24). 
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Figure 2. Flow chart for the synthesis of a polynucleotide of 600-1 1 00 bp in 
length, Eigjrt synthetic oligonucleotides (01-08) which overlap their respective 
neighbouring sequences by an average of 18 nt were mixed pairwise in four 
assays at the stoichiometiy indicated. After five amplification cycles, products 
A-D are obtained Fragment A was combined with B and C with D, producing 
£ and F^ respectively, after another five additional amplificati(»i cycles. 
Asymmetry in DNA strand composition was reintroduced by amplifying E in 
the presence of Ol and F in the presence of 08. The final product G was 
piepoied by combining the asymmetric mixtures of E and F and an^lifying for 
1 2 cycles. The piToduct was purified fiom agarose gels. For fragments where ID or 
12 oligonucleotides are required as startmg material, as in the synthesis of 
oligonucleotides encoding pS3, purified amplification products corresponding 
to G and D or F were subjected to amplification steps III and IV to yield the final 
products such as p83 DNA fiagmcnts I and 11. 

total sequence synthesized including two stop codons and 
flanking restriction cleavage sites comprised 4940 bp. 

Synthesis and cloning of polynucleotides encoding MSP-1 
or portions thereof 

The full-length msp-i gene sequence, designed above, was 
subdivided into five overlapping fragments each corresponding 
to one of the major processing products of MSP- 1, namely p83, 
p30, p38, p29 and pi 9 (Fig. 1 B). They were synthesized using the 
PCR-based procedure outlined in Figure 2 that allows efficient 
production of double-stranded DNA fragments of up to 1200 bp 
long, W^ith the exception of DNA encoding p83, each fragment 
was prepared from four pairs of overlapping synthetic oligo- 
nucleotides that covered the fragment as part of the upper or lower 
strand. To facilitate later assembly of adjacent fi^gments, the 
tenninal 01 oligonucleotides, which encoded the fragment 
N-termini, also contained the tmique 3' cleavage sites from the 
upstream adjacent fragment and the terminal 08 oligonucleo- 
tides, encoding the fragment C-termini, contained the unique 5' 
cleavage sites from the downstream adjacent fragments. Moreover, 
every Ol oligonucleotide contained a ^^mHI site upstream of its 
imique 5' cleavage site. Similarly, every 08 oligonucleotide 
contained two tandem stop codons and a Clal site or a Not\ and 
a Clal site downstream of the unique cleavage site (Fig. IB and 
C). These features allow the cloning of each individual fragment 
encoding a processing product of MSP-1 via BamHUClal 



cleavage. An exception to this is the DNA fragment encoding 
p83, which contained the N-temiinal signal peptide. In this case, 
the 5'-end of 0 1 contains a Mlul site immediately upstream of the 
start codon of the full-size gene. This Mlul site allows us to 
transfer the assembled MSP-1 -encoding DNA into any vector via 
the unique MluVClal or MluVNotl sites. The synthesis of the 
DNA fragment encoding p83 and the N-terminal signal sequence 
required another modification of the scheme due to the size of 
2025 bp of the oligonucleotide. The more N-terminal fiagment I 
was synthesized fix»m a total of 12 oligonucleotides of 106-126 bp. 
Eight of these produced the 5' portion (778 bp) of fiagment I 
according to Figure 2. The 445 bp 3' portion was synthesized 
from four oligonucleotides according to steps I and II in Figure 2. 
The complete fragment I was then generated by joining these two 
fragments by PGR as outlined for steps III and IV in Figure 2. The 
p83 DNA Augment 11 (954 bp) was generated from 10 
oligonucleotides in an analogous procedure. Both p83 DNA 
firagments I and II were, after sequence verification, combined via 
a unique Seal site at position 1 145. 

The full-size gene encoding the entire MSP-1 was obtained by 
successively joining the various DNA fragments corresponding 
to the processing products as outiined in Figure IB. The 
polynucleotide encoding the entire MSP-1^* was generated by 
combining the fragment encoding p83 and p30 with die fusion 
products encoding p38 and p42 via the EccAllU cleavage site. 
Sequences encoding signals for GPI anchoring were attached by 
inserting properly modified polynucleotides via the SphUClal 
cleavage site whereas modifications of the N-terminal signal 
peptide were generated by exchanging sequences upstream of the 
unique HindVH site (data not shown). The synthetic gene which 
includes the coding sequence for the authentic signal peptide and for 
the GPI-mediated anchoring signal of the parasite is designated 
msp-]^ (Fig. IC). For expression studies in Kcoliy the sequence 
encoding the signal peptide was removed, yielding msp-J^. This 
was achieved by replacing the Mlul-HindHl fiagment by an 
appropriately synthesized 5fliwHI-///>idIII oligonucleotide as out- 
lined in Figure I B and C. The resulting gene starts with an ATG 
followed by the codon for amino acid 21 (Fig. IC). All the 
sequences described herein were submitted to the EMBL 
database under accession no. AJ131294. 

Controlled expression of msp-1^ DNA in E.coH and 
isolation of the products 

The msp-J^^Qs weD as portions thereof encoding p83, p30, p38, p42 
and pi 9 were placed under the control of an HTG-induciblc 
promoter in a vector that fuses six histidines to the N-temiinus of the 
e:q>ression products. Upon induction, all six proteins were produced 
in Rcoli and cell extracts were subjected to Ni^"** chelate chromato- 
graphy under denaturing conditions. Renaturation of the adsorbed 
material via a urea gradient and subsequent ehition with imidazole 
hydrochloride yielded the six proteins in highly purified and sohible 
form. The induction of MSP-1 ^ synthesis in Rcoli and electro- 
phoretic characterization of purified pi 9^83 is shown in Figure 3 A. 
The isolation procedure via an N-teiminal histidine tag was chosen 
since several fragments do not contain the epitope recognized by the 
5.2 mAb antibody most suitable for immiinoaffrnity chromato- 
graphy. Moreover, higji expression levels may lead to inclusion of 
bodies fiTom which the respective proteins can be readily recovered 
under denaturing conditions. Adsorption of such proteins to Ni^"*" 
chelate supports facihtates their renaturation. 
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Figure 3. Expression of msp-I^ and msp-I fragments in E.coli and in manimalian cells. (A) The msp-l^^ sequence was inserted into expression vector pDSS6 where 
it is under control of an IPTG inducible promoter. Upon transfer in E.coli strain SGI 3009, the synthesis of MSP- 1^^ can be induced (IPTG+). The arrow identifies 
an induced product with a molecular weight of -190 kDa. M denotes a molecular weight marker. (B) Electrophorctic analysis of IPTG-induced MSP-1 fragments 
produced in Exoli and purified by Ni^"*" chelate chromatography. The various fragments are indicated above the lane. The designation of the protein fragments (34) 
is not identical with their molecular mass calculated firom the respective sequences (Fig. I). In addition, some (p38, p30 and pi 9) show a migration behaviour that 
does not allow a strict correlation with tiw molecular weight standard. M denotes a broad range molecular weight marker (New England Biolabs), Coomassie stained 
4-12% PAA gradient gel. (C) Expression of MSP- 1 in HeLa and CHO cells. HtTA-93/9 and CtTA27y29 cells were grovm to 40% confluency in the presence of Dox 
(100 ng/ml) before the antibiotic was removed. After 24 h, cell extracts were prepared and analysed by western blot using mAb 5.2. Lanes 1 and 2 show extracts from 
induced (-Dox) and uninduccd (+I>ox) cultures, lane 3 shows extracts from HtTA-1 and CtTA cells. (D) Electrophoresis of purified MSP-1 from HeLa cells and firom 
Exoli. The left lane shows a MSP-1 ^' preparation obtained from cultures of HtTA-93/9 cells after immunoaffinity purification using the 5.2 mAb and the right lane 
shows full-size material from EcoU obtained by Ni^-^ chelate chromatography, Coomassie stained PAA gels. 



Production of MSP-1 in mammalian cells 

It may be difficult to obtain in Exoli the properly folded form of 
a protein like MSP-1 that is normally secreted and membrane 
anchored. We therefore also studied the expression of the 
synthetic genes encoding MSP-1^^ and MSP-1^^ in HeLa and 
CHO cells. Since preliminary results suggested that MSP-1 may 
interfere with the cellular metabolism (data not shown), its 
synthesis was controlled via the tetracycline regulatory system 
(31). Thus, the coding sequences were placed under the control 
of a bidirectional promoter (30) that is responsive to the 
tetracycline-controlled transcriptional transactivator (tTA). In 
these constructs, the expression of the msp-1 gene is co-regulated 
with the luciferase reporter gene which is used as a convenient 
screening tool for identifying stably transformed cell lines. HeLa 
and CHO cell lines for the controlled expression of the msp-1 
gene were generated by transfecting HtTA-1 and CtTA-1 cells, 
which constitutively produce tTA (31), with the appropriate 
plasmids. Clones that showed good tetracycline-dependent 
regulation of luciferase were selected and examined for MSP- 1^* 
synthesis. Several HeLa and CHO cell lines such as HtTA-93/9 
and CtTA-27/29 were established. They exhibited high regulation 
factors for luciferase (up to 1 0^-fold) and also co-regulate well the 
synthesis of MSP-1 (Fig. 3B). Interestingly, although the msp~l 
gene encodes the genuine signal peptide, no protein could be 
recovered from the culture supernatant of both HtTA-93/9 and 
CtTA-27/29 cells, suggesting that the protein is not liberated 
under these conditions. HtTA-93/9 cells grown at preparative 
scale allowed us to isolate full-length MSP-1 from cell extracts 
via immunoaffinity chromatography (Fig. 3D) on columns 
prepared with mAb 5.2. 



Interaction of heterologously produced MSP-1 with 
monoclonal antibodies directed against the native protein 

To gain a first insight into conformational properties of MSP- 1 as 
isolated from E.coli and from mamitvalian cells, purified antigen 
was reacted with a panel of MSP-1 -specific monoclonal antibodies. 
Of these antibodies, five are specific for the Kl prototype as 
represented by the FCB-1 sequence, six are known to recognize 
epitopes within the conserved parts of the molecule and two are 
specific for MAD20 sequences. Nine of the antibodies react with 
conformational rather than with sequential epitopes (35,36J. 
When preparations of MSP-1 fi-om CHO cells and MSP- 1 ^2 
from Exoli were probed with the various antibodies in western 
blots, a rather striking result was obtained. MSP- 1^^ isolated from 
Exoli interacted with all 1 1 antibodies that are specific for the Kl 
prototype. No interaction was seen with antibody 9.7 specific for 
the MAD20 prototype or with antibody 1 2. 1 , which recognizes an 
oligomorphic sequence of block IV which is not present in MSP- 1 
of the FCB-1 strain (Fig. 4 and Table 2). In contrast, MSP-l^* 
isolated fi-om CHO cells while interacting with most Kl-specific 
antibodies was not recognized by three of the raAbs that bind to 
MSP-1 isolated firom Exoli. Again, as expected, there was no 
interaction between CHO-derived MSP-1^^ and monoclonal 
antibodies 9.7 and 12.1 (Fig. 4 and Table 2), 

DISCUSSION 

DNA of P.falciparum has an extraordinaryily high AT content 
which can exceed 90% in intragenic regions and may reach 75% 
in coding sequences. The reasons for the strong preference of AT 
over GC, most clearly revealed at the wobble position of codons 
(Table 1), are not understood A consequence of the high AT 
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Figure 4. Interaction of MSP-1 from E.coli and CHO cells with monoclonal antibodies. MSP- 1^2 and isolated from E.coli and CHTA-27/29 cells, 

respectively, were subjected to electrophoresis under non-reducing conditions and analysed by western blot using the monoclonal antibodies indicated. MSP-1 ^' from 
CHO cells (C) migrates slower than MSP-l^^ from E.coli (E) due to glycosylation in the cvJcaryotic expression system. (A) Antibodies directed towards conserved 
portions of MSP-1. Interactions with antibodies specific for dimorphic regions are shown in (B) (Kl) and (C) (MAD20). 

Table 2. MSP-1 purified from £.co/i and CHO cellsy respectively, was si^jected to gel electn^horesis under non-reducing cotKlitions 
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Inununoblots with a panel of well-characterized monoclonal antibodies (3235,36) were prepared and visually evaluated (Fig. 4), 
Indirect inmiunofluorescence assays (IFA) were performed as described in Harlow and Lane (33). The number of + reflects the 
intensity of immuiwfluoresccnce. ND, experiment not done due to limiting amounts of mAb 6. 1. The western blots for mAb 22 arul 
6.1 are not shown in Figure 4 since limiting amounts of the antibodies did not allow us to cany out this experiment more than once. 



content oiP.falcipantm DNA is the failure of cloning and stably 
maintaining large genes in E.coli rendering in-depth studies of the 
respective gene functions most difficult In some cases, the biased 
codon composition was even believed to hamper the expression 
of Rfalciparum genes in heterologous systems (37,38). 

The synthesis of a 4917 bp long polynucleotide encoding the 
190 IcDa MSP-1 of the FCB-I strain in a codon composition that 
reduces the AT content to 55% has opened up new possibilities 
for the study of this intriguing protein since the synthetic gene can 
now be stably cloned and expressed in its entirety in E.coli as well 
as in a variety of other heterologous systems. Several parameters 
were reconciled in the design of the synthetic msp-1 gene. Thus, 
it appeared sensible to place unique endonuclease cleavage sites 
at or near positions where the protein is processed at the surface 



of the mature schizont which permits separate cloning and 
expression of the portions of the msp-l gene that encode the 
different processing products (Fig. 1). Although, with the 
exception of the GPI anchor at the C-terminus, native MSP-1 of 
Rfalciparum appears not to be glycosylated (39) we have 
conserved potential glycosylation sites as any change in the 
amino acid composition may destroy epitopes important in the 
host-parasite interaction. Moreover, choosing proper hosts for 
MSP-1 synthesis can prevent the modification of such sites and, 
finally, it will not be difficult to eliminate glycosylation sites at a 
later stage should it become desirable. 

The unique HindUl site near the 5'-^nd and the Sphl site near 
the 3'-end of the gene allow for switching signal peptides or 
membrane anchoring signals. Indeed, besides die sequence 
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modifications shown here (MSP- 1 ^ to MSP-1 Fig. 1 ), we have 
fused sequences encoding several other specific signal peptides 
or anchoring signals with the gene (data not shown). Further 
parameters that were considered are discussed in Results and 
Materials and Methods. 

For synthesizing an oligonucleotide of the size of the msp-J 
gene, we have examined several strategies. The approach 
described here is based on an asymmetric amplification process 
that starts out with eight overlapping oligonucleotides and leads 
to an end product of 600-800 bp, without the requirement of 
isolating intermediates. Following this strategy, it is essential to 
limit the size of the starting oligonucleotides to < 120, optimally 
to 70-90 nt, since the error rate of the PGR products increases for 
longer oligonucleotides, most likely due to incomplete deprotection 
or to modifications of nucleotides during chemical synthesis. The 
complete msp-l^^ gene and all the synthetic intermediates 
obtained by this procedure were stably cloned in E.coli confirming 
that the parameter responsible for the instability of large 
P.falcipantm genes in E.coli is the high AT content. 

First expression studies with the synthetic sequences in Exoli 
revealed tfiat MSP-1^^ was readily produced as an intracellular 
protein. Moreover, N-terminal fiision of a histidine tag allows its 
rapid isolation in soluble form via affinity chromatography. 
Encouragingly, the examination of such MSP-1^^ preparations 
with a panel of monoclonal antibodies, of which several are 
considered to recognize conformational epitopes, indicates that at 
least some portions of the protein are properly folded under the 
conditions used. Production and purification of MSP-1 firagments 
corresponding to the various processing products of the native 
protein are even more efficient and all the fi:agments are obtained as 
soluble proteins, a prerequisite for structural and functional studies. 

Despite the promising results in E.coli, proper folding of a 
complex protein like MSP-1 that is transported to the surface of 
the parasite may be more readily achieved in a eukaryotic system 
under conditions of secretion and possibly membrane anchoring. 
We have, therefore, begun to study the expression of the msp-J^^ 
gene in mammalian cells. After initial studies suggested that 
synthesis of MSP- 1 may negatively affect the growth of HeLa 
cells, we placed the gene under tetracycline control and generated 
stable HeLa and CHO cell lines where MSP-1^* synthesis is 
stringently controlled and can be induced over several orders of 
magnitude. Full-size protein is recovered firom cell extracts upon 
induction. Since the msp-l^^ gene encodes the genuine signal 
peptide of the parasite, which is quite similar to other eukaiyotic 
signal sequences, one might anticipate the secretion of the protein 
into the culture supernatant We failed, however, to detect any 
secreted material (data not shown) and are presently analysing in 
which compartment of the cell the protein accumulates. The 
isolated protein migrates distinctly slower in an electric field than 
the protein produced in Exoli and thiis it can be assumed that it 
enters the endoplasmic reticulum and the Golgi pathway where 
it is glycosylated, biterestingly, fiiU-size MSP-1 can be isolated 
fiiom cell extracts by affinity chiomatpgrai^y with mAb 5.2, i.e. by 
an antibody that recognizes a conformational epitope near the 
C-terminus. This indicates again that at least this rather critical 
domain of the protein may be in a conformation identical to the 
native one. This conclusion is supported by the reactivity of the 
antigen with mAbs 12.10, 7.5 and 2.2, all of which have been 
mapped to conformation-dependent epitopes within pl9 of 
MSP-1 (Table 2). 



The availability of a msp-l gene that can be transferred, stably 
maintained and expressed in various biological systems will 
advance the elucidation of its role in the parasite *s life cycle. This 
will include structural analysis of the intact protein as well as of 
its processing products. Of particular interest will be the analysis 
of the interaction of MSP-1 and its processing products with 
erythrocytes. 

Several approaches are under way in our laboratory. For 
example, we have successfully placed full-size MSP-1 as well as 
portions thereof onto the surface of HeLa cells (P.Burghaus, 
manuscript in preparation) and Toxoplasma gondii where they are 
anchored by a GPI moiety (I.Turbachova et aL, manuscript in 
preparation). Interaction with our collection of mAbs suggests 
that the surface-exposed proteins have assumed the natural 
conformation. These systems are opening up a variety of 
experimental strategies aimed at the analysis of MSP-1 function. 
This will include not only the interaction of MSP- 1 or any portion 
thereof with the surface of erythrocytes but also questions 
concerning the maturation of the MSP-1 precursor. Thus, hetero- 
logously produced MSP- 1 or portions thereof may constitute useful 
substrates for proteases involved in this process (41). Finally, it 
will be possible to study the interaction between purified MSP-1 
fragments representing the natural processing products which 
may allow reassembly of the MSP-1 complex in vitro. Together, 
such studies may lead to new insights into the early phases of 
erythrocyte invasion and reveal new targets for interfering with 
P.falciparum infection at the blood stage. 

The synthetic msp-l gene will also facilitate the rigorous 
examination of the protective potential of MSP-1 or any portion 
thereof when used as an experimental vaccine. Most of the recent 
MSP-1 -based vaccination trials focused on the C-terminal 
portion of the protein, in part for technical reasons. While such 
studies have clearly identified tiiis region of the molecule as 
promising, the analysis can now be extended throughout the 
entire molecule as there is little reason to exclude any portion of 
this surface protein from such examination, particulariy considering 
the contribution of cellular responses towards malaria immunity 
where MSP-1 could play a role (40). The FCB-1 -derived amino 
acid sequence encoded in our synthetic gene is highly homologous 
to MSP-1 of the Colombian FVO strain which is well adapted to 
Aotus monkeys. Novel experimental vaccines that include the 
entire protein or any portion thereof can now be tested in this 
animal model and first monkey trials with vaccines based on 
heterologously expressed proteins as well as a recombinant 
attenuated Tgondii are under way. The sequence chosen here 
belongs to the Kl prototype. It diverges maximally finm the 
MSP-1 sequence of the 3D7 strain, a representative of the 
MAD20 prototype. The synthesis of the gene encoding the 3D7 
MSP-1 is presently being completed in our laboratory. Together, 
the two genes will allow comparative structural and immunologic 
studies. In particular, they will permit vaccination studies with 
heterologous challenge infections in AoUts monkeys. Moreover, the 
availability of unlimited amounts of MSP-1 proteins representing 
the processing products or any other portion of the Kl- and the 
MAD20-derived MSP-1 will make a detailed analysis of the 
humoral response in populations where Rfalcipantm is endemic 
feasible. Such analyses may lead to more reliable correlations 
between patterns of humoral response and susceptibility towards 
infection and disease as was suggested in earlier studies (10,1 1), 
possibly allowing for the development of diagnostic tools wilii 
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predictive vahie. Indeed, analysing sera from recent immunization 
trials with Actus monkeys using the various MSPrl fragments 
have revealed clear correlations between antibodies directed 
towards certain areas of the protein and protection (R.Tolle et aL, 
manuscript in preparation). 

Finally, since the well-characterized 3D7 strain is being used in 
human trials for challenge infections, immunization of humans 
followed by homologous or heterologous challenge appears 
feasible in the near future. Together, such studies should increase 
ounmderstanding of the role of MSP- 1 in the parasite*s life cycle, 
the basis of its dimorphic nature and its potential as a component 
in a subunit vaccine. 
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I j;ngineering a bioiuminescent indicator for cyclic AMP-dependent 
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cDNA coding for the luciferase in the firefly Photinus pyralis was amplified in vitro to generate cyclic AMP-dependent 
protein kinase phosphorylation sites. The DNA was transcribed and translated to generate light-emitting protein. A valine 
at position 217 was mutated to arginine to generate a site RRFS and the heptapeptide kemptide. the phosphorylation site 
of the porcine pyruvate kinase, was added at the N- or C-terminus of the luciferase. The proteins carrying phosphorylation 
sites were characterized for their specific activity, pT, effect of pH on the colour of the light emitted and effect of the 
catalytic subunit of protein kinase A in the presence of ATP. Only one of the recombinant proteins (RRFS) was 
significantly different from wild-type luciferase. The RRFS mutant had a lower specific activity, lower pH optimum, 
emitted greener light at low pH and when phosphorylated it decreased its activity by up to 80%. This latter effect was 
reversed by phosphatase. This recombinant protein is a good candidate to measure for the first time cyclic AMP- 
dependent phosphorylation in live cells. 



INTRODUCTION 

A universal feature of eukaryotic cells is the ability of 
physiological agonists, such as hormones, growth factors and 
neurotransmitters, components of the body's defence system, 
non-host antigens and other pathogens and drugs, to interact 
with the plasma membrane and trigger molecular events within 
the cell. These agents initiate a molecular sequence that starts 
with the generation of an intracellular signal, such as Ca-"*", cyclic 
AMP, inositol trisphosphate or diacylglycerol, and ends with a 
physiological or pathological event in the cell (Campbell, 1983; 
Berridge & Irvine, 1989). These events include movement, 
secretion, transformation, division, defence and death. The 
timing and magnitude of the end response in each cell depends on 
the timing and location of both the intracellular signals and the 
covaient modifications they induce. A particular cell will only 
undergo an end response if the right sequence of molecular 
thresholds has occurred (Campbell, 1983, 1988, 1990). 

Measurement and imaging of intracellular Ca^'^ using fluores- 
cent and bioiuminescent indicators (Campbell, 1983 ; Cobbold & 
Rink, 1987) has established that one explanation for gross 
heterogeneity in individual cell responses is variadon in the 
timing and the location of the intracellular Ca^-*- signal. In 
neutrophils, for example, four subpopulations have been defined, 
including one group shoNving no response at all (Hallett et qL, 
1990; Da vies et ai, 1991). A major problem in elucidating the 
molecular basis of heterogeneity within a cell population is the 
lack of a method for measuring and manipulating covaient 
modification of proteins in live cells. The purpose of the work 
reported here was to engineer cyclic AMP-dependent protein 

100: TCATCGCTGAATACAGTTAC (3' end anlisense) 
101 : GGTAAAATGGAAGACGCCAAAAAC (5' end sense) 

[05: CACCTAATACGACTCACTATAGGGAGAATGGAAGACGCCAAAAAC (5' end antisense including the T7 promoter) 
07; AGAACTGCCTGCCGCAGATACTCGCA(5' end sense, underlined bases generate R codon) 
08: TGCGAGAATCTGCSGCAGGCAGTTCT (3' end antisense, underlined bases generate R codon) 

113: CCTTGTCGACTTAGCCCAGGGAGGCCCGCCGCAGCAATTTGGACTTTCC (3' end antisense with 21 bases coding for kemptide, a 

stop codon and a Sad restriction site) 
|14: GGCCTCCCTGGGCGAAGACGCCAAAAAC (5' end sense, part of kemptide) 

n-K: CACCTAATACGACTCACTATAGGGAGAATGCTGCGGCGGGCCTCCCTGGGC (5' end sense, clamp, T7 promoter and part of the 
Coding sequence for kemptide) 

Abbreviations used: CL count, chcmiluminescence count; KNt, luciferase with kemptide at A^-temiinus; KCt, luciferase with kemptide at 
^-terminus. 

* To whom correspondence and reprint requests should be addressed. 
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kinase phosphorylation sites into firefly luciferase, such that a 
change in colour and/or light intensity occurred after phosphoryl- 
ation and dephosphorylation (Campbell, 1989). 

Benzothiazole luciferases occur only in luminous beetles. They 
contain approx. 550 amino acids, and require ATP, Mg'''" and 
Og, as well as a common luciferin, to generate light (Campbell, 
1988). Just a few amino acid changes can cause the colour to shift 
from green to green-yellow, yellow or red (Wood et al,, I989a,6). 
Recognition sites for protein kinase A (Cohen, 1988) have been 
added to oc-interferon to allow high-specific-activity labelling for 
binding studies (Li era/., 1989). In a previous study the hepta- 
peptide kemptide (LRRASLG) (Zetterqvist et aL, 1976; Kemp 
et al, 1977) was chemically coupled to extracted luciferase from 
the firefly Photinus pyralis. Photinus luciferase {Photinus- 
lupiferin: oxygen 4-oxidoreductase ; EC 1.13.12..7) emits yellow 
light with a peak intensity at 565 nm. The coupled kemptide 
shifted the colour of the light emitted to the red and phosphoryl- 
ation shifted it even further (Jenkins era/., 1990). Here PCR 
was used followed by transcription-translation in vitro (Sala- 
Newby et aL, \990a,b) to change an amino acid sequence VRFS 
(217-220) (de Wet et al,, 1987) to RRFS, or to add kemptide to 
the A^- or C-terminus of the protein. 



MATERIALS AND METHODS 
Materials 

Oligonucleotide primers were prepared using an Applied 
Biosystems 381 A DNA synthesizer and purified as *trityl-on' 
oligonucleotides (100, 101) or* trityl off' (105, 107, 108, 113, 114, 
T7-K), Their sequences were as follows: 
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The coding sequence for firefly luciferase was isolated from a 
cDNA library (Sala-Newby et uL, 1990a,6). A 2400 bp Sail frag- 
ment was used as the target for amplification. Amphtaq DNA 
polvmerase was from Perkin-Elmer Ltd., U.K.. T7 RNA poly- 
merase was from Promega, nucleotides and Sephacryl SlOO were 
from Pharmacia and Centricon 100 cartridges were from Amicon, 
U.K. [y.^^pjAXp (io_50 Ci/mmol), [a-=»=P]UTP (3000 Ci/mmol), 
stabilized [='^S]methionine 1000 Ci/mmol), RNAase inhibitor 
and rabbit reticulocyte lysate (N90) were purchased from 
Amcrsham International ptc. Restriction enzymes, alkaline 
phosphatase (24 unitsZ/d) and iuciferin were from Boehnnger 
Corp. Low-protein-binding ultrafiltration units, Ultrafree MC, 
were from Millipore Corp. Protein kinase A inhibitor (P-3294) 
and kemptide were from Sigma Chemical Co. All other AnalaR- 
grade reagents were from Sigma Chemical Co. and BDH 
Chemicals. The catalytic subunil of cyclic AMP-dependent 
protein kinase was generously given by Dr. K. J. Murray of 
Smith Kline Beecham, Welwyn, Herts., U.K. 

Preparation of DNA fragments 

Addition of the 'T? RNA polymerase promoter 
(TAATAGGACTCACTATAGGGAGA) (Stoflet et aL 1988) 
and the DNA sequence coding for kemptide (CTGCGGCG- . 
GGGGTCCCTGGGC), as well as mutation of two bases wuhm 
the luciferase cDNA, were carried out using PGR (Saiki et a/., 
1988) as previously described (Sala-Newby et aL, \990b). Firefly 
(Phoiinus pvralis) cDNA (4 ng/ml) was amplified in a solution 
containing 10 mM-Tris/HCl (pH 8.3), 50 mM-KCl, 2 mM-MgCl^, 
0 01 % (w/v) gelatin, 0.2 mM each of the four deoxynucleoside 
triphosphates, 0.5 of each oligonucleotide primer and 40 
units of Amplitaq DNA polymerasc/ml. The cycling reactions 
were carried out in a Perkin-Elmer thermal cycler. Each of the 25 
cycles consisted of I min at 94 X, 1 min at 55 °C, 2 mm at 
70 °C plus a 5 s extension on each cycle. Klenow fragment of 
Escherichia coli DNA polymerase (40 units/ml) was added after 
the completion of the 25 cycles, and incubated for 30 mm at 
37 °C. 

The final product was extracted once with phenol/ 
chloroform/3-methylbutan-l.ol (25:24:1, by vol.) and precipi- 
tated with 2 vol. of 7.5 M-ammonium acetate plus 2,5 vol. of 
ethanol. The DNA concentration was assessed visually from 
ethidium bromide-stained agarose gels by comparison with the 
bands of a standard DNA (Sambrook era/., 1989). 

Wild-type firefly luciferase DNA preceded by the T7 RNA 
polymerase promoter was prepared using oligonucleotide primers 
100 and 105 (see under * Materials'). The kemptide coding 
sequence was added to the 3' end of the firefly cDNA using 
primers 105 and 113. The incorporation of the kemptide coding 
sequence at the 5' end and the 2 bp change coding for the 
mutation V-217->R were carried out in two stages (Higuchi, 
1990). After the first amplification the primers were removed by 
filtration through Centricon 100 cartridges (Higuchi, 1990). The 
first stage of the introduction of kemptide at the iV-lerminus was 
carried out using oligonucleotide primers 114 and 100. The 
resulting DNA (4 ng/ml) was ampUfied for 25 cycles in the 
presence of oligonucleotide primers T7-K and 100 to produce the 
final product. The first stage of the preparation of the RRFS 
variant generated two fragments: 5' end fragment (643 bp) was 
generated by amplification with oHgonucleotide primers 101 and 
108 and the 3' end (1018 bp) with oligonucleotide primers 107 
and 100. In the second stage the two fragments were mixed in 
equimolar amounts (2 jug of total DNA/ml), denatured (1 nun at 
94 °C) and allowed to reanneal by decreasing the temperature at 
5.7 "C/min to 37 °C in amplification mixture containing primers 
105 and 100 followed by I min extension at 72 *C. Eight to 
twelve cvcles of amplification under normal conditions followed. 
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Formation of luciferase in vitro 

PCR products (0.5-1.5 /tg/25/a of incubation mixture) were 
transcribed as previously described (Sala-Newby et aL^ \^^q^^ 
The RNA capped using 0.5 mM-m^G(5')ppp(5')G and 0.1 mM- 
GTP was precipitated twice with 0.2 vol. of 7.5 M-ammonium 
acetate and 2.5 vol. of ethanol. The RNA (1-100 ng) in 2^1 ' 
of 10 mM-Tris/HCl, 1 mM-EDTA (pH 7.4), 3/^1 of potassium 
acetate and magnesium acetate to optimize their concentration 
(90-110 and 1.6-2.0 mM final concns. respectively) and 5 ji\ of 
rabbit reticulocyte lysate N90 were incubated for 1 h at 30 °C, 
and luciferase activity as chemiluminescent (CL) count was 
measured in a home-built luminometer (Campbell, 1988) for 10 s 
at room temperature in 229 fi\ of 20 mM-Tris/acetate/0.3 mM^ 
dithiothreitol/0.2 mM-EDTA/ 1 mg of BSA/ml/12 mM-magne- 
sium acetate/ 1. 5 mM-ATP, pH 7.75. The reaction was started by 
addition of Iuciferin to 0.2 mM final concentration (I ng of 
extracted luciferase yields 2.1 x 10* and 2.9 x 10** CL counts/ 10 s 
in the presence of rabbit reticulocyte lysate and buffer respec- 
tively). The amount of protein synthesized was measured by 
including 15 fiCx of ["S]methionine/10 /tl of translation cocktail. 
Proteins were separated on SDS/9% (w/v) polyacrylamide gels 
under reducing conditions (Laemmli, 1970) followed by fluoro- 
graphy and exposure to preflashed X-ray film. The luciferase 
bands were excised from the gel, radioactivity was measured in 
a liquid-scintillation counter and the amount of protein was 
estimated taking into account the concentration of methionine 
(28 /tM) in the lysate. 

Phosphorylation of proteins 

The proteins were synthesized in 100-250 /il of rabbit reticulo- 
cyte mixture, precipitated in 64 % saturated ammonium sulphate, 
resuspended in 100 /i\ of 50 mM-Tris/Mes/l mM-EDT A/0.3 mM- 
dithiothreitol (pH 7.8) for normal, and for protein with kemptide 
AT-terminus (KNt) and kemptide at C-terminus (KCt) and pH 7.2 
for RRFS, and subjected to gel filtration on a column 
(0.7 cm X 20 cm) packed with Sephacryl SlOO and equilibrated in 
the corresponding buffer. Active fractions were pooled and 
concentrated by ultrafiltration. Protein was measured by the 
method of Lowry et al. (1951). BSA (fraction V) was used as 
standard. . , 

The phosphorylation was carried out in a mixture containing 
20 mM-Mes, 60 mM-sodium glycerol 2-phosphate, 30 mM-NaF, 
10 mM-magnesium acetate, 1 mM-EDTA, 1 mgof BSA/ml, 1 ;tg 
each of leupeptin and pepstatin/ml and 125 /iM- ATP, pH6.8. 
Active fractions from the gel filtration (0.8-1 .2 mg of protein/ml. 
of which approximately 0.1 % was luciferase) were added to^ 
gether with 0.5 /i\ of purified catalytic subunit of protein k»^ase A 
or catalytic subunit diluent (0.5 M-potassium phosphate/0^1 /o 
Tween.20, pH 6,8) per 40/^1 of mixture [the catalytic subumi 
can transfer 7 mmol of ^^P/min per /i\ using 20 /^M-malantide 
as a substrate, as in Murray et ai (1990)]. The incubation 
were carried out at 30 X for 10-20 min. Kemptide wa 
also phosphorylaied in the presence of rabbit reticulocyte tua 
was gel-filtered under the same conditions as the vanants 
the presence of [y-^^PJATP (Livesey & Martin, 1988). in 
phosphorylated proteins were stored on ice until ready to assay- 



Dephosphorylation of the luciferase '* 

When the phosphorylated proteins were to be treated wU « 
alkaline phosphatase, the phosphorylation buffer contained j| 
sodium glycerol 2-phosphate nor NaF. For the ^ephospboa | 
ation reaction 0.7 unit of alkaline phosphatase//tl and 0.01 m^^ , 
protein kinase inhibitor (Cheng et ai, 1985) were added to ( 
phosphorylation mixture. 
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£ffect of pH on activity and colour of the light emitted by the 
ygriants 

Chemiliiiiiin^scence from the enzymes was measured by di- 
luting them 40-fold into an assay mix with pH ranging from 6 to 
9 containing mixtures of 50 mM-Mes and 50 mM-Tris to give 
tjie desired pH, 0.3 niM-dithiothreitol, 0.2 mM-EDTA, I mg of 
BSA/niU 12 mM-magnesium acetate, 0.2 mM-luciferin and 
1 5 mM-ATP. Colour was assessed using a dual-wavelength 
luDiinonneter fitted with narrow-band pass-interference filters, 
^ih a maximal transmission at 603 nm (red) and 545 nm (green) 
Qf 30.2 and 35.3 % respectively (Campbell et al,, 1985). The light 
produced by the luciferase reactions was measured simul- 
taneously at the two wavelengths and the ratio of activity at 
603 nm to activity at 545 nm was calculated. The ratio was 
corrected for the transmission of the filters, but not for the 
spectral sensitivity of the photomultiplier tubes, which at 603 nm 
was approximately 10% of its value at 545 nm. 



RESULTS 

Characterization of PCR products 

The PCR was used to amplify cDNA coding for wild-type 
firefly luciferase and for variants containing putative protein 
kinase A-recognition sites at position 217-220 (referred to as 
RJIFS), kemptide at A^-terminus (referred to as KNt) or kemptide 
at C-terminus (referred to as K.Ct). 

The PCR products were characterized using three criteria : size 
on agarose-gel electrophoresis, formation of ^^P-labelled mRNA 
of the correct size on glyoxal/agarose-gel electrophoresis and 
' translation in vitro of the mRNA to form light -em it ting protein. 
This protein was compared with wild-type synthetic luciferase 
for molecular mass, specific activity, pH profile and colour and 
with firefly tails luciferase when appropriate. The PCR generated 
a single DNA band apparently of the correct predicted size for 
all the recombinant DNA, i.e. for wild-type and RRFS the 
predicted size is 1682 bp, for KCt the predicted size is 1703 bp, 
and for KNt a major band is present at the predicted size 1 703 bp 
with a minor band at 380 bp (Fig. 1). The yields were 1-3 /tg of 
DNA/0. 1 ml of reaction mixture. No bands were seen without 
addition of primers or when template DNA was omitted. 

Transcription of the PCR products with T7 RNA polymerase 
generated a major band of ^^P-labelled capped mRNA of the 
correct length, i.e. 1650 bp. Small quantities of longer and 
! shorter mRNA products were observed (Fig. 2). The latter could 
■.not generate light-emitting protein because removal of 12 amino 
acids at the C-terminus reduces the activity by 99 % (Sala-Newby 
^taL, 1990ft). The yields of capped mRNA were 4-8 molecules 
of RNA per DNA molecule, the lower yields corresponding 
to DNA coding for KNt, No mRNA was detected in gels when 
the DNA transcribed lacked T7 promoter in spite of the detection 
of ["P]UTP incorporation equivalent to 0.05 molecule of RNA 
per DNA molecule. mRNA generated light-emitting protein 
(Table 1) and a major ^*S-labelled protein band of the expected 
molecular mass (60 kDa) on SDS/PAGE (Fig. 3). 

^aracterization of the recombinant variants 

T*he new proteins were characterized using three criteria: 
?P«cific activity (i.e. CL counls//ig of RNA and CL counts/ng of 
protein), effect of pH 6-9 on their activity and colour of the light 
^tted as assessed by the ratio of chemiluminescence at 603 nm 
to 543 nm. 

T'he CL counts/ 10 s per ng of protein indicated the effect the 
^edifications had on the catalytic activity of the protein, 
t-uciferasc with kemptide at the A^- or C-terminus had a specific 
^^ivity similar to that of the wild-type and extracted luciferase. 



However, the specific activity of the RRFS variant was only 
10-15% of that of wild-type luciferase (Table 1). The specific 
chemiluminescent activity estimated per jig of RNA differed 
between the variants by a greater factor than the activity per fig 




1 2 3 4 5 6 7 
Fig. 1. Agarose-gel electrophoresis of cDNAs prepared by PCR 

Wild-type firefly cDNA was amplified with oligonucleotides 105-100 
(lane 2). The fragments of DNA that were used to prepare RRFS 
variant are shown in lanes 3 and 4. They correspond to PCR 
products obtained using oligonucleotide primers 107-100 (3' end) 
and 101-108 (5' end). The firefly RRFS cDNA is shown in lane 5; 
it was prepared by the amplification of DNA from lanes 3 and 4 in 
the presence of primers 105-100. cDNAs coding for variants with 
kemptide at A^- and C-terminus are shown in lanes 6 and 7 
respectively. Size markers were //mdlll-digested A DNA (lane 1). 




0.56 



Fig. 2. Transcription products of the cDNAs 

cDNAs produced by PCR were transcribed using T7 RNA poly- 
merase and the '^P-labelled mRNAs were separated by glyoxal/ 
agarose-gel electrophoresis, dried and autoradiographed as de- 
scribed in the Materials and methods section. The size markers were 
•'^P-labelled /fmdill-digested A DNA (lanes I and 4) (Sambrook 
et al.y 1989). RNAs for the recombinant proteins are shown as 
follows: RRFS (lane 2), wild-type (lane 3), KNt (lane 5), KCt (lane 
6). 
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Molecular 
mass (kDa) 
1 80.0 - 
116.0- 
84.0- 
58.0- 
48.5- 

36.5- 
26.6- 




Fig. 3. Synthesis in vitro of recombinant proteins 

mRNAs were translated using rabbit reticulocyte lysate in the 
presence of [^*S]methionine. The proteins were separated by 
SDS/PAGE under reducing conditions. RRFS (lane 1), wild-type 
(lane 2). KCt (lane 3), KNt Gane 4) and the products in the absence 
of mRNA Oane 5) are shown. Prestained molecular-mass (Da) 
markers were : ag-macroglobulin ( 1 80 000), ;?-galactosidase (116 000). 
fructose 6-phosphate kinase (84000), pyruvate kinase (58000), 
fumarase (48 500), lactate dehydrogenase (36500) and triosc phos- 
phate isomerase (26600). 



Table 1. Specific activity of the luciferases 

The values in parentheses indicate the number of independent DNA 
amplifications. Results arc expressed as means ±s.e.m. and when 
only two determinations were made the range is given. 





CL counts/ 10 s 


CL counts/ 10 s 


Protein/RNA 


Variant 


per ng of protein 


per /ig of RNA 


(mol/mol) 


Wild-type 


(2.6±0.5)x 10* (3) 


(3.0±0.6)x 10' (5) 


1.04 


synthetic 




(3.9±1.3)xW (6) 




RRFS 


(3.0±l.0)xl0* (6) 


O.ll 


KNt 


(2.1-3.7) X 10* 


(4.2-4.7) X 10* 


0.14 


KCt 


(2. 1-2.0) X 10" 


(3.7-8.9) X 10« 


0.27 


Extracted 


2.1x10'' 







of protein. AJl the variants with phosphorylation sites showed 
less activity per /£g of RNA than the wild-type variant (Table 1). 
The number of molecules of protein produced per molecule of 
RNA in the translation assay, estimated from the two specific 
activities, confirmed that the normal synthetic wild-type enzyme 
yielded up to nine more copies of RNA than the other three 
(Table 1). The lower levels of translation shown could reflect 
differences in the secondary structure of the inRNA with the 
translation assay being optimized for the wild-type synthetic 
RNA. An additional effect due to a change in the codon usage 
cannot be ruled out. The RNA coding for KNt also contained an 
RNA band at approx. 500 bp (Fig. 2, lane 6) which would not 
translate into active protein as this would reduce the specific 
activity estimated per total RNA. 

Effect of phosphorylation and dephosphorylation 

Initial experiments using kemptide as a substrate for protein 
kinase A indicated that rabbit reticulocyte lysate inhibited 
phosphorylation of kemptide (results not shown). Gel filtration 
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Fig, 4. Effect of phosphorylatior>-dephosphorylation on the activity of 
RRFS luciferase (V-217 R) 

Partially purified variant RRFS was incubated at 30 °C as described 
in the Materials and methods section in the presence of kinase 
diluent only (O), protein kinase A catalytic subunit with (♦) and 
without iiic) phosphate inhibitors. At 24 min alkaline phosphatase 
and protein kinase A inhibitor were added (i). Samples were taken 
from the tubes at various times up to 90 min, diluted immediately 
40- fold into luciferase assay mixture pH 7.2 and the chcmilunii- 
nescence was measured for 10 s. Activity at time 0 was measured 
before addition of kinase. Results are presented as percentage of 
activity at time 0. Each point is a mean of two and a representative 
experiment is shown. Experiments were carried out with protein 
produced from two separate PCRs. 



removed the inhibitory activity and 1.7 ±0.2 (/t = 3) nniol of 
phosphate (1.3-1.4 mol when phosphatase inhibitors were omit- 
ted) was incorporated into kemptide after 20 min incubation per 
40 //I of reaction mixture. 

Incubation of the RRFS luciferase variant with protein kinase 
A catalytic subunit in the presence of ATP resulted in a decrease 
in its catalytic activity to 19±4% {n = 5) within 20 min, and 
remained at this level' for the duration of the experiment, i.e. 
90 min (Fig. 4). When alkaline phosphatase was added to the 
phosphorylated RRFS luciferase, the chemiluminescent activity 
increased to control levels within 30 min. No effect of protein 
kinase A was observed on the activity of wild-type luciferase, nor 
on recombinant luciferases with kemptide at the N- or C- 
terminus, at any pH (Figs. 5a and 5b). 

Attempts to demonstrate a change in pi between the various 
recombinant luciferases, using isoelectric focusing, were un- 
successful, because of artifactual bands generated froni the 
focusing procedure. However, the major band for recombinant 
and extracted luciferase had the same pi (6.6). 

Recombinant wild-type luciferase had a pH optimum of 
around 7.8, identical with that of the extracted luciferase (Fig. 
6a). Addition of kemptide at the N- or C-terminus appeared to 
have no effect on the pH profile (Figs. 5b and 6a). Similarly these 
three recombinant proteins had similar colour shifts to the re 
at acidic pH (Fig. 6b). In contrast the RRFS mutant luciferase 
showed both an altered pH profile with optimum activity a 
pH 7.2 (Figs. 5a and 6a) and a shift in colour to the green at aci 
pH (Fig. 6b). The inhibitory effect of phosphorylation on RRf'.^ 
activity was most marked at its optimum pH (Fig. 5a). The rau 
of hght emission at 603 nm/543 nm measured at pH 7.5 change 
from 0.16 to 0.32 after phosphorylation, indicating that the Ug 
became redder. Since the detection system used fdr ^^^^"^J 
measurements was less sensitive to red light, this red shift ma 
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Fig. 5. Effect of pH and protein kinase A on the activity of recombinant 
luciferases 

The variants were incubated for 1 5 min in the presence of protein 
kinase A (closed symbols) or kinase diluent only (open symbols) as 
described in the Materials and methods section. The resulting 
enzyme activity was then measured in triplicate (mean±s.E.M.) at 
various pH values, (a) Wild-type O) and RRFS O). (b) 
KNt (A, A) and KCl (T, V). 



partly explain the decrease in activity for the phosphorylated 
enzyme. 

DISCUSSION 

The results presented demonstrate that DNA amplification 
coupled to transcription-translation in vitro allowed the gener- 
ation and characterization of firefly luciferase variants con- 
taining phosphorylation sites. Only one of the variants (RRFS) 
showed a decrease in its activity when incubated with the catalytic 
subunit of protein kinase A in the presence of ATP, and the effect 
was reversed by addition of alkaline phosphatase (Figs. 4, 5a and 
56). The enzyme activity per unit of protein of the wild-type 
variant and the luciferases with kemptide at the A^- or C-terminus 
were indistinguishable from that of the extracted luciferase 
(Table 1). The activity expressed per unit of RNA was more 
variable and lower for all the variants with phosphorylarion sites 
than for the wild-type. The pH-activity profile for KNt, KCt and 
wild- type were very similar, but RRFS had a lower pH optimum 
(Fig. 6fl). The colour of the light emitted was assessed by 



1.00 



0.80 



0.60- 



0.40- 



Z 0.20 




6.0 



— , — 

6.6 



7.2 



— I — 

7.8 



— I — 

8.4 



9.0 



pH 



Fig: 6. Effect of pH on the activity and colour of the light emitted by 
luciferase variants 

{a) pH optimum curve. The results are expressed as % of maximal 
activity (mean is.E.M.) from three to live experiments, each in 
triplicate. O, Wild-type recombinant; extracted luciferase; Oi 
RRFS; A. KNt; V, KCt. (ft) pH effect on the colour of the light 
produced. The ratio of chemiluminescent counts at 603 nm and 
545 nm was measured in triplicate at each pH (mean±s.E.M.). O. 
' Wild-type recombinant; extracted luciferase;' O, RRFS; A. 
KNt; V» KCt. The luciferase from firefly tails was purified as 
described by Sala-Newby et al. (19906). 



measuring the ratio of activities at 603 nm and 545 nm (Fig. 6^^). 
At alkaline pH no significant differences were detected but as the 
pH was decreased the variant RRFS had a significantly lower 
ratio, indicating that the light emitted was greener than for all the 
others. As chemiluminometers contain photomultipliers which 
are more sensitive to green than red light this colour change 
cannot explain the decrease in specific activity measured. The 
activity was measured under saturating concentrations of ATP 
and luciferin, suggesting that the was decreased. 

Several beetle luciferases have now been cloned: Photinus 
pyralis, Pyrophorus plagiophthalamus and Luciola cruciata (de 
Wet etal., 1987; Wood et al., 1989a,6; Tatsumi et al,, 1989). 
Spectral changes are known to occur in the light emitted by 
firefly luciferase in response to changes in pH and temperature, 
and in the presence of heavy metals (Seliger & McElroy, 1964). 
Work on four click-beetle luciferases that show 94-99 % sequence 
homology demonstrated that a small number of amino acid 
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substitutions were responsible for the different colours displayed 
by the luciferases. The spectral shift between luciferases yellow- 
green and yellow belong to the amino acid set, R-223, L-238 
E, V with the effect probably being due to R-223 E (Wood 
et al., I989fl,Z>). Since al) the beetle luciferases use the same 
luciferin, the colour of the light emitted in the reaction must 
depend on the environment around the emitter (i.e. oxyluciferin). 
The oxyluciferin can exist as a monoanion (ketonic form) or 
dianion (enolic form) at acid and basic pH respectively. The 
presence of an arginine in position 223 of the click-beeile 
yellow-green luciferase seemed to be responsible for a shift to the 
green in the light it emitted. The change V-217-^R-217 that 
generated RRFS in Photinus luciferase introduced a basic amino 
acid in that area of the protein and also resulted in a shift to the 
green of the light emitted, suggesting that a positive charge there 
stabilized the oxyluciferin dianionic form, the green emitter. 

The phosphorylation of the RRFS variant by the catalytic 
subunit of protein kinase A decreased its activity, and de- 
phosphorylation reversed the effect. The decrease in activity was 
accompanied by a spectral shift to the red that can account for 
part of the lower activity measured. The other two variants, KNt 
and KCt, showed no detectable differences from the wild-type 
luciferase in any aspect. The luciferase with kemptide at the C- 
terminus was expected to show properties different from the 
normal luciferase in view of the fact that the removal of 12 amino 
acids at the C-terminus nearly abolishes activity (Sala-Newby 
et al.^ 19906). The removal of three amino acids (results not 
shown) and the addition of the seven amino acids from kemptide 
at the C-terminus did not affect the catalytic properties. This 
could be important when using firefly luciferase or its variant in 
eukaryotic cells because the last three amino acids of the C- 
terminus contain a peroxisomal targetling signal (Keller et al., 
1987; Gould et aL, 1987). 

The RRFS variant provides, for the first time, an indicator 
potentially useful for measuring protein phosphorylation in 
intact cells, and has also highlighted a domain within the enzyme 
that results in changes in colour in response to a change in 
charge. Recognition peptides for other kinases could thus be 
engineered in this region of the protein, thereby establishing a 
universal strategy for measuring any protein kinase and visualiz- 
ing it in living cells (Hooper et al., 1990). 
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ABSTRACT 

A simple, effective measure of synonymous codon usage bias, the Codon 
Adaptation Index, Is detailed. The Index uses a reference set of highly 
expressed genes from a species to assess the relative merits of each codon, 
and a score for a gene Is calculated from the frequency of use of all codons 
In that gene. The Index assesses the extent to which selection has been 
effective In moulding the pattern of codon usage. In that respect It Is 
useful for predicting the level of expression of a gene, for assessing the 
adaptation of viral genes to their hosts, and for making comparisons of 
codon usage In different organisms. The Index may also. give an approximate 
Indication of the likely success of heterologous gene expression. 

INTRODUCTION 

The determination of the DNA sequences of a large number of genes from 
a wide variety of species has revealed that, in a large proportion of cases, 
the alternative synonymous codons for any one amino acid are not used 
randomly (1, and references therein). Further, it has been noted that a part 
of this nonrandom usage is species, or rather taxon, specific (2). However, 
within species there is considerable heterogeneity between genes, and in the 
two best studied organisms, namely Escherichia coll and the yeast 
Saccharomyces cerevislae , there is a clear positive correlation between 
degree of codon bias and level of gene expression (3,4). Examination of 
large data sets from these species reveals that within species differences 
are largely in the degree rather than the direction of codon usage bias 
(5,6). 

For many reasons it is desirable to quantify the degree of bias In 
codon usage in each gene in such a way that comparisons can be made both 
within and between species. One approach to this problem is to devise a 
measure for assessing the degree of deviation from a postulated Impartial 
pattern of usage. The codon preference bias proposed by McLachlan et al. (7) 
is such a measure. Recently Sharp et al. (5) have proposed to calculate the 
chi square value for the deviation from random codon usage and then scale 
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the value by the gene length (number of codons) so that comparisons can be 
made between genes. 

Another approach is to assess the relative merits of different codons 
from the viewpoint of translational efficiency. For example, Ikemura (1,8,9) 
has identified certain "optimal" codons in E.coli and yeast which are 
expected to be translated more efficiently than others, and calculated the 
frequency of optimal codons in a gene. The codon bias index of Bennetzen and 
Hall (4), for use with yeast genes, is essentially similar. Such indices are 
certainly useful, but have several disadvantages. First, some amino acids 
are usually excluded because it is not clear which codons are "optimal". 
Second, all codons considered are classified into only two categories, i.e., 
optimal and nonoptimal, with no recognition that some codons within each 
category are better than others. Third, there Is no good basis for 
comparison between species because the proportional division of the codon 
table into the two categories may differ; e.g., Ikemura (1) identified 21 
optimal codons for 1 A amino acids in E.coli , and 19 optimal codons for 13 
amino acids in yeast. 

Gribskov et al. (10) have recently proposed another index, the codon 
preference statistic . This statistic is based on the ratio of the likelihood 
of finding a particular codon in a highly expressed gene to the likelihood 
of finding that codon in a random sequence with the same base composition 
as that in the sequence under study. They show that the statistic is useful 
for locating genes in sequenced DNA, for predicting the relative level of 
their expression, and for detecting sequencing errors. However, the statistic 
is not normalized and therefore the values for two genes encoding proteins 
with different amino acid compositions can be quite different even if both 
genes use only the "best" codons. 

With various purposes in mind we have devised a new index. It is 
similar to the codon preference statistic but Is normalized so that It is 
convenient for making comparisons both within and between species. After 
describing the index, we show some rather varied applications and indicate 
certain advantages over other indices. In recognition of the role of natural 
selection in producing high levels of codon bias, we call this statistic the 
Codon Adaptation Index . 

METHODS 

We recognize that even In E.coli and yeast the factors determining the 
frequency of synonymous codon usage are not completely understood, but that 
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several points are clear: the pattern of codon usage in any particular gene 
is largely determined by natural selection and mutation (5,6); selection 
appears to occur via translational efficiency, so that synonymous codon 
usage in highly expressed genes is under the strongest selective constraints 
(4,8^9); in E.coli and yeast, very highly expressed genes appear to have the 
greatest degree of synonymous codon bias (3-6,8). From these points it Is 
deduced that the pattern of codon usage in very highly expressed genes can 
reveal (1) which of the alternative synonymous codons for an amino acid Is 
the most efficient for translation, and (11) the relative extent to which 
other codons are disavantageous . 

The first step Is, then, to construct a reference table of relative 
synonymous codon usage (RSCXJ) values from very highly expressed genes of the 
organism in question. An RSCU value for a codon Is simply the observed 
frequency of that codon divided by the frequency expected under the 
assiimption of equal usage of the synonymous codons for an amino acid (5). 
Thus, 

RSCU^j - [1] 

where X Is the number of occurrences of the Jth codon for the 1th amino 
acid, ani n is the number (from one to six) of alternative codons for the 
1th amino acid. The relative adapt iveness of a codon, w , is then the 
frequency of use of that codon compared to the frequency of the optimal 
codon for that amino acid: 

w - RSCU / RSCU - X /X [2] 
IJ ij Imax ij imax 

where RSCU and X are the RSCU and X values for the most 

imax imax 
frequently used codon for the ith amino acid. 

Codon usage data have been compiled previously for 165 genes from 

E.coli (6), and for 110 genes from yeast (5). To obtain reference RSCU 

values, we have taken the 27 very highly expressed E.coli genes compiled by 

Sharp and Li (6), which include genes encoding 17 ribosomal proteins, four 

outer membrane proteins and four elongation factors. For yeast a set of 24 

genes has been taken from the high expression group previously identified 

(5). These include 16 genes encoding ribosomal proteins, one for an 

elongation factor, and seven loci encoding very abundant enzymes. The RSCU 
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Table 1 . Values of RSCU and w for codons in very highly expressed genes from 

E. coli and yeast. 









E 


coli 




Yeast 








E 


coli 




Yeast 






RSCU 




w 


RSCU 




w 






RSCU 




w 


RSCU 




w 


Phe 


UUU 


0. 


456 


0 


296 


0 


203 


0. 


113 


Ser 


UCU 


2. 


571 


1. 


000 


3. 


359 


1. 


000 




UUC 


1. 


544 


1 


000 


1 


797 


1. 


000 




UCC 


1. 


912 


0. 


744 


2. 


327 


0. 


693 


Leu 


UUA 


0. 


106 


0 


020 


0 


601 


0. 


117 




UCA 


0. 


198 


0. 


077 


0. 


122 


p. 


036 




UUC 


0. 


106 


0 


020 


5 


141 


1. 


000 




UCG 


0. 


044 


0. 


017 


0. 


017 


0. 


005 


Leu 


cuu 


0. 


225 


0 


042 


0 


029 


0. 


006 


Pro 


ecu 


0. 


231 


0. 


070 


0. 


179 


0. 


047 




cue 


0. 


198 


0.037 


0 


014 


0. 


003 




cec 


0.038 


0. 


012 


0. 


036 


0. 


009 




CUA 


0. 


040 


0 


007 


0 


200 


0. 


039 




CCA 


0. 


442 


0. 


135 


3. 


776 


1. 


000 




CUG 


5. 


326 


1 


000 


0 


014 


0. 


003 




ecG 


3. 


288 


1. 


000 


0. 


009 


0. 


002 


lie 


AUU 


0. 


466 


0 


185 


1 


352 


0. 


823 


Thr 


ACU 


1 


ft OA 


0. 


965 


1. 


899 


0. 


921 




AUG 


2. 


525 


1 


000 


1, 


643 


1. 


000 




ACC 


1. 


870 


1.000 


2. 


063 


1. 


000 




AUA 


0. 


008 


0 


003 


0. 


005 


0. 


003 




ACA 


0. 


141 


0. 


076 


0. 


025 


0. 


012 


Met 


AUG 


1. 


000 


1 


000 


1 


000 


1. 


000 




ACG 


0. 


185 


0. 


099 


0. 


013 


0. 


006 


Val 


GUU 


2. 


244 


1 


000 


2 


161 


1. 


000 


Ala 


GCU 


1. 


877 


1. 


000 


3. 


005 


1. 


000 




GUC 


0. 


148 


0 


066 


1 


796 


0. 


831 




GCC 


0. 


228 


0. 


122 


0. 


948 


0. 


316 




GUA 


1. 


111 


0 


495 


0 


004 


0. 


002 




GCA 


1. 


099 


0. 


586 


0. 


044 


0. 


015 




GUG 


0.496 


0 


221 


0 


039 


0. 


018 




GCG 


0. 


796 


0.424 


0. 


004 


0. 


001 


Tyr 


UAU 


0. 


386 


0 


239 


0 


132 


0. 


071 


Cys 


UGU 


0. 


667 


0. 


500 


1. 


857 


1. 


000 




UAC 


1. 


614 


1 


000 


1 


868 


1. 


000 


UGC 


1. 


333 


1. 


000 


0. 


143 


0. 


077 


ter 


UAA 


















ter 


UGA 


















ter 


UAG 


















Trp 


UGG 


1. 


000 


1. 


000 


1. 


000 


1. 


000 


His 


CAU 


0. 


451 


0 


291 


0 


394 


0. 


245 


Arg 


CGU 


4. 


380 


1.000 


0. 


718 


0. 


137 




CAC 


1. 


549 


1 


000 


1 


606 


1. 


000 


CGC 


1. 


561 


0. 


356 


0. 


008 


0. 


002 


Gin 


CAA 


0. 


220 


0 


124 


1 


987 


1. 


000 




CGA 


0. 


017 


0. 


004 


0. 


008 


0. 


002 




CAG 


1. 


780 


1 


000 


0 


013 


0. 


007 




CGG 


0. 


017 


0. 


004 


0. 


008 


0. 


002 


Asn 


AAU 


0. 


097 


0 


051 


0 


100 


0. 


053 


Ser 


AGU 


0. 


220 


0. 


085 


0, 


070 


0. 


021 




AAC 


1. 


903 


1 


000 


1 


900 


1. 


000 




AGC 


1. 


055 


0. 


410 


0. 


105 


0. 


031 


Lys 


AAA 


1. 


596 


1 


000 


0 


237 


0, 


135 


Arg 


AGA 


0. 


017 


0. 


004 


5 


241 


1. 


000 




AAG 


0. 


404 


0 


253 


1 


763 


1. 


000 




AGG 


0, 


008 


0. 


002 


0. 


017 


0. 


003 


Asp 


GAU 


0. 


605 


0 


.434 


0 


.713 


0. 


554 


Gly 


GGU 


2. 


283 


1. 


000 


3. 


898 


1. 


000 




GAG 


1. 


395 


1 


000 


1 


.287 


1. 


000 




GGC 


1. 


652 


0. 


724 


0. 


077 


0. 


020 


Glu 


GAA 


1. 


589 


1 


.000 


1 


.968 


1. 


000 




GGA 


0. 


022 


0. 


010 


0. 


009 


0. 


002 




GAG 


0. 


411 


0 


.259 


0 


.032 


0. 


016 




GGG 


0. 


043 


0. 


019 


0. 


017 


0. 


004 



Genes used: 

E.coll - 17 ribosomal protein genes, 4 elongation factor genes, 4 outer 
membrane protein genes, recA , dnaK (data from Ref .6) 

Yeast - 16 ribosomal protein genes, TEF 1, 2 enolase genes, 2 GA-3-PDH 
genes, ADH 1, PGK, pyruvate kinase (data sources given in Ref. 5) 
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and w values obtained for very highly expressed genes from E.coli and yeast 
are given in Table 1. 

The Codon Adaptation Index (CAT) for a gene is then calculated as the 
geometric mean of the RSCU values (from Table 1) corresponding to each of 
the codons used in that gene, divided by the maximum possible CAI for a gene 
of the same amino acid composition, i.e.. 



where 



L 1/ 

CAI ^ - ( II RSCU, ) *^ [4] 
obs \ - k 

k-1 

CAI - ( II RSCU, ) [5] 

max \ - kmax 

k-1 

where RSCU is the RSCU value for the kth codon in the gene, RSCU is 

k kmax 
the maximum RSCU value for the amino acid encoded by the kth codon in the 

gene, and L is the number of codons in the gene. 

Note that If a certain codon is never used in the reference set then 
the CAI for any other gene in which that codon appears becomes zero. To 
overcome this problem we assign a value of 0.5 to any X that would 
otherwise be zero. Also, the number of AUG and UGG codons are subtracted 
from L, since the RSCU values for AUG and UGG are both fixed at 1.0, and so 
do not contribute to the CAI . 

As illustration, consider the rpsU gene from E.coli which, excluding 
the initiation codon, comprises 70 codons and has the sequence: 

.CCG.GTA.ATT.AAA.GTA 

For that sequence and from the RSCU values in Table 1: 

1/70 

CAI - (3.288 X 1.111 X 0.466 x 1.596 x 1.111 x ) 

obs 

1/70 

and CAI - (3.288 x 2.244 x 2.525 x 1.596 x 2.244 x ) 

max 

From these two values and equation [3] we can obtain the CAI value. 
We note that equation [3] is exactly equivalent to: 

CAI - ( II w ) [6] 
k-1 ^ 
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Table 2 . CAI values for E.coll and yeast genes. 



E 


coll 


yeast 


gene 


CAI 


gene 


CAI 


17 RPs 


0.467-0. 813 


16 RPs 


0.529-0.915 


rpsU 


0.726 


hlstones 


0.532-0.733 


rpoD 


0.582 






dnaG 


0.271 


2u plasmld 


0.099-0.106 


lad 


0.296 


GAL 4 


0.116 


trpR 


0.267 


PPR 1 


0.114 


IPP 


0.849^ 


GPD 1 


0.929^ 


hsdS 


0.218** 


mat A2 


0.098 



RPs - rlbosomal protein genes. 

a highest CAI value among data set. 

b lowest CAI value among data set. 



where w is the w value for the kth codon In the gene (see equation [2]). 
k 

Therefore, for rpsU : 



1/70 

CAI - (1.00 X 0.495 X 0.185 x 1.000 x 0.495 x ) 



Equation [6] saves computation time. To overcome real number underflow 
problems in computer calculations, equation [6] can be computed as: 

L 

CAI - exp ^ ^ In w [7] 
L k-1 ^ 



or from a codon usage table: 



18 n^ 



CAI - exp t ^ X. . In w [8] 

L i-1 j-1 

where X^j and n^ are as defined in equation [1]. 

There is no Intrinsic effect of gene length (L) on CAI, but CAI values 
from short genes may be more variable due to sampling effects. 

APPLICATIONS and DISCUSSION 

Predicting levels of gene expression within a species . 

CAI values clearly parallel levels of gene expression. Ribosomal 
protein genes are highly expressed, and have generally high CAI values 
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Yeast 



E.coli 



1.0 




15 
10 
5 
0 



T7 



ML. 



.2 



.4 



CAI 



.6 



1.0 



Figure 1 . Distribution of CAI values for (a) 106 yeast genes, (b) 165 
E.coli genes, and (c) 50 bacteriophage T7 genes. In (a) and (b) 
ribosomal protein genes are cross-hatched. Plasmid genes are excluded. 



(Table 2, Figure 1). Among yeast ribosomal protein genes only that encoding 
S33 has a CAI < 0.6, and it is a very short gene (L - 65). Lowly expressed 
regulatory genes (e.g., lad . trpR in E.coli ; GAL 4 , PPR 1 in yeast) have 
low CAI values (Table 2) . In E.coli the relationship between codon bias and 
gene expression is perhaps best illustrated by considering operons (as 
suggested by Gouy and Gauticr, Ref.3). For example, within the macro- 
molecular synthesis operon the expression levels are rpsU » rpoD » dnaG 
(11), and the CAI values for these genes are 0.726, 0.582 and 0.271, 
respectively (Table 2) . Eight of the nine genes of the unc operon encode the 
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Table 3 . CAI values for genes in the unc operon of E.coll . 



Pos 


Gene 


/'AT 


T 


Gene Product 
name amount sector 


1 
1 


papX 


u . Z J o 






?? 




2 


papD 


O.AOO 


253 


chi 


1 




3 


papH 


0,583 


71 


omega 


10 


^0 


4 


papF 


0.482 


152 


psl 


2 




5 


papE 


0.374 


169 


delta 


1 




6 


pap A 


0.665 


501 


alpha 


3 




7 


papC 


0.403 


273 


gamma 


1 




8 


papB 


0.650 


444 


beta 


3 




9 


papG 


0.474 


133 


epsilon 


1 





Pos : gene position within the operon (1 - 5*). 
The relative amount of each gene product in the ATPase 
complex is taken from Ref.l2. 



eieht subunits of the F and F sectors of the H -ATPase complex, and the 

^ 0 1 

stoichiometry of these subunits is known (12). The CAI value is clearly 

correlated with the level of gene expression among the genes encoding 

subunits of the F sector (Table 3), with the CAI values for papA and papB 

being similar, and much higher than those for papE , papC and papG . Among 

genes encoding subunits in the F^ sector the rank order of CAI values 

corresponds to the relative amounts of the gene products required. The CAI 

for papH is perhaps surprisingly low. but this is a very short gene (Table 

3). The function of pap I is unknown. The CAI value for papl is very low, and 

may indicate that this is a regulatory gene, or perhaps (see below) a 

noncoding open reading frame. 

Although many of the measures of codon bias discussed in the 
Introduction seem to be positively correlated with gene expression, we feel 
that CAI has the twin advantages of being simple to calculate and making 
greater quantitative use of available Information (see 'Comparison of CAI 
with other indices* below). 

The positive correlation between degree of synonymous codon bias and 
expression level in E.coll (and yeast) seems firmly established, but the 
causal relationship between the two has been debated. We have concluded 
elsewhere (6) that the degree of codon bias reflects the past action of 
natural selection -- it Is indicative of the level at which the gene is 
expressed, rather than dictating that level. This seems to concur with 
conclusions drawn from a theoretical model of the translation process (13). 
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Table 4 . CAI values for mananalian genes using E.coll and yeast RSCU values. 



Heterologous gene 


Host 

E.coli 


Yeast 


Human alpha Interferon 


0.218 


0.099 


Human insulin 


0.307 


0.043 


Human growth hormone 


0.287 


0.082 


Human factor VIII 


0.205 


0.114 


Hximan factor IX 


0.263 


0.176 


Bovine chymosin 


0.326 


0.086 



Predicting levels of heterologous gene expression . 

There is experimental evidence that certain codons can affect 
expression level (14-17). For example, the AGG codon markedly affects the 
translation rate of genes in E.coli (14,15), This suggests that for a 
heterologous gene to have a maximal level of expression its codon usage must 
correspond to that of the host. By using the RSCU values of potential hosts 
to calculate CAI values for a heterologous gene it should be possible to 
predict how well suited that gene would be to the trans lational systems of 
those hosts. In Table 4 the CAI values of some genes of biotechnological 
interest are given for two different potential hosts, E.coli and yeast. In 
each case these mammalian genes seem better 'adapted* to E.coli , suggesting 
that high expression might be more easily obtained in that system. Of 
course, in reality, the choice of host would probably depend on other 
practicalities. The CAI would, however, suggest whether it is likely to be 
either necessary or of any benefit to chemically synthesize a new gene, to 
include more appropriate codons. It should be stressed that the CAI is only 
an approximate indication of the suitability of the codon usage within a 
gene. For example, it takes no account of the distribution of codons along 
the gene, yet theoretical considerations suggest that this may be very 
important (18). 

A measure of evolutionary adaptedness . 

Under certain natural circumstances foreign genes are expressed in host 
organisms. Viral genes are an obvious example. Codon txsage in the many 
bacteriophages which do not encode their -own tRNA molecules should be 
adapted to the translational machinery of the host. Then the CAI, using host 
RSCU values, is an estimate of the degree of adaptation. For example, 
comparison of the pattern of codon usage in the genes of bacteriophage T7 
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Table 5 . CAI values for homologous genes from E.coll and T7. 



E.coli 


CAI 


T7 


CAI 


gene 




gene 




ssb 


0.605 


2.5 


0.573 


dnaG 


0.271 


4 


0.301 




0.391 


5 


0.341 


polA 


6 


0.387 



with the relative abundance of cognate tRNA molecules in E.coll (considered 
to be the usual host of T7) suggests that T7 genes are not so well adapted 
as E. coil 's own genes, although there Is clearly some adaptation (19,20). 
This seems to be confirmed by contrasting the distribution of CAI values for 
T7 genes with those of E.coli (Figure 1). However, the difference seen in 
Figure 1 could arise in part because the genes contrasted encode different 
products; for example, T7 encodes no ribosomal proteins. It has been 
reported that four genes in T7 are homologous to three E.coli genes (21). A 
comparison of these genes (Table 5) Is not conclusive, because only ssb is 
highly adapted in E.coli , although in that case the T7 gene does have a 
lower CAI. The four T7 genes as a group do not seem to be significantly less 
adapted than the three E.coli genes. 

In cases where it has not been clear which organism represents the 
major host for a virus it may prove informative to calculate CAI values with 
the different RSCU values of potential hosts. For example, despite 
approximately 65% DNA homology between ^174 and G4, the genomes of these 
two "coliphages" show a remarkable difference with respect to the frequency 
of the recognition sites of enterobacterial restriction enzymes (22). While 
^174 (as well as several other coliphages) has a significant avoidance of 
these sites, presumably reflecting adaptation to infecting E.coli , G4 does 
not. However, CAI values for the 10 genes of PX174 and G4 are very similar, 
suggesting that the patterns of codon usage of the two phages are adapted 
(to E.coli ) to equivalent extents. 

Natural foreign gene expression would also occur if genes undergo 
horizontal transfer. Felmlee et al. (23) have discussed a possible example. 
They reported the DNA sequence of a region of the E.coli chromosome encoding 
four hemolysin genes, and found that their base composition and codon usage 
are atypical of that species. This, together with the observation that these 
genes are found in only a limited number of E.coli strains, was taken as 
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evidence that the genes represent a recent acquisition to this species (23) . 
The CAI values for these genes are indeed very low, ranging from 0,202 to 
0.243. These values are lower than those for nearly all other E.coli genes 
(see Figure 1, in which the hemolysin genes are not included), including 
some (e.g., araC and dnaG ) which are expressed at very low levels. Hemolysin 
is an extracellular protein and would be expected to be expressed at much 
higher levels than araC or dnaG , so these low CAI values suggest that the 
hemolysin genes are not well adapted to E.coli , and seem to confirm the 
suggestion of a recent acquisition. If reference RSCU data were available 
for a variety of organisms from which the genes could have been transferred, 
it might be possible to determine the most likely source by comparison of 
CAI values . 

If plasmids were regularly subject to interspecific transfer, then 
their genes might not become adapted to any one host. Genes on E.coli 
plasmids tend to have less codon bias than chromosomal genes (3). We note 
that the three genes of the yeast 2 micron plasmid have very low CAI values 
(Table 2). 

Synonymous codon usage and the rate of molecular evolution . 

A major prediction of the neutral theory of molecular evolution (24) is 
an inverse relationship between the rate of evolution and the degree of 
selective constraint, i.e., the stronger the constraint the slower the rate 
of molecular evolution. Indeed, a great deal of evidence confirms this. 
Including the observation that pseudogenes, which are under no apparent 
constraint, are the fastest evolving DNA sequences (25). That synonymous 
substitutions in protein coding genes occur at a slower rate than 
substitutions in pseudogenes (26,27) implies that there are selective 
constraints on the former. If the differences between genes in degree of 
codon usage bias largely reflect differences in selection pressure on 
synonymous codons, then the rate of synonymous substitution would be 
inversely related to the degree of codon bias. The CAI can be used to 
quantify this relationship. Comparisons of E.coli and Salmonella typhimurium 
genes do indeed show a significant negative correlation between the rate of 
synonymous substitution and the CAI (28). 
Comparison of codon usage in different organisms . 

Meaningful comparisons of codon usage in different organisms can be 
made if care is taken in defining the reference set of genes from which the 
RSCU values are calculated. The reference sets we have chosen for E.coli and 
yeast comprise very similar collections of genes, yet the distribution of 
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CAI values for genes from these two organisms are rather different. Very 

highly expressed genes In yeast have on average a more extreme codon bias 

than their counterparts In E.coll , as seen for example with rlbosomal 

protein genes (Table 2). The reference set of RSCU values reflects this, and 

so the genes with least codon usage bias In yeast have lower CAI values than 

genes In E.coll , as a result. It Is particularly Interesting to note that 

cluster analysis of yeast genes based on their synonymous codon usage 

clearly differentiates two groups, identified as comprising highly and 

moderately/lowly expressed genes (5), and that those two groups correspond 

almost exactly to the blmodal distribution of CAI values for yeast genes in 

Figure 1. By contrast, cluster analysis does not so easily differentiate 

highly and lowly expressed genes in E.coll or in T7 (5) and the 

distributions of CAI values from those organisms are unimodal (Figure 1) , It 

is not clear why selection has apparently been more successful in producing 

high codon bias in yeast than in E.coll . Li (29) has shown that the 

effectiveness of selection in maintaining synonymous codon bias depends 

largely on the strength of selection and effective population size. It could 

be that the strength of selection is stronger in yeast than in E.coll 

because the required amount of certain gene products, such as rlbosomal 

proteins. Is larger. It is also possible that the effective population size 

is larger in yeast than in E.coll because the latter has a largely clonal 

population structure (30). 

We note that comparisons between species can be difficult when the 

reference sets of genes have quite different levels of bias in codon usage. 

For example, very highly expressed genes have a much lower bias in codon 

usage in Bacillus subtilis than in E.coll or yeast (Shields and Sharp, in 

prep.). Then, in B. subtilis , there are few codons with very low w values. 

As a consequence, CAI values for other genes in B.subtllus are, on average, 

higher than those seen in the other species, even though the B.subtllus 

genes have clearly less bias. The CAI given by equation [4] is less 

obs 

affected by this difference in the reference set, and may form a better 
basis for comparison between species under these circumstances. 
Identification of protein-coding reading frames . 

Several of the indices of codon usage bias were originally devised in 
order to ascertain the likelihood that open reading frames are Indeed 
protein-coding. As with the other measures, the CAI should be useful in this 
context, particularly in locating genes of moderate to high expression. 
However, some of the points outlined above Indicate that difficulties may 
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arise in interpreting low CAI values. Thus, while a high CAI is probably a 
good indication that a reading frame is protein- coding, a low CAI may 
indicate a gene o£ low expression, a gene of heterologous origin (as with 
the hemolysin genes) , or a noncoding region that happens to contain no 
termination codons. The CAI value expected for a random sequence can easily . 
be calculated, but a relatively high value for a noncoding sequence may 
arise simply because DNA is not a random sequence of nucleotides, or because 
there is a coding sequence on the complementary strand (31). For example, an 
E.coli gene with no UUA, CUA or UCA codons, but otherwise having the typical 
codon composition of a nonhighly expressed gene (6) , would give rise to an 
in phase open reading frame on the complementary strand with a CAI of 
approximately 0.28, which is similar to the lower values seen for E.coli 
genes (Figure 1) and somewhat higher than the value (about 0.17) expected 
for a random sequence. 
Comparison of CAI with other indices . 

The CAI is a very simple measure of the extent of synonymous codon 
usage bias, specifically in the direction of the bias seen in highly 
expressed genes. It has the advantage, compared with indices which measure 
only the frequency of certain optimal codons, of taking account of all 59 
codons where synonymous alternatives exist, each in a quantitative manner. 
For example, both the codon bias index (4) and the frequency of optimal 
codons (1) treat GCU and GCC equally, as preferred codons for Ala in yeast, 
and yet the frequency of GCU is approximately three times that of GCC in 
very highly expressed genes (Table 1). With heterologous gene expression in 
mind it may be of primary importance to know the frequency of particularly 
disadvantageous codons in a gene. Simpler indices compound these very rare 
codons with others not in the * optimal* category. Thus in E.coli AUA and AUU 
are treated eqixally (1), despite their very different frequency of use (see 
Table 1, and Ref.6). Again the CAI takes account of these differences 
quantitatively. 

The codon preference statistic (10) is similar but not identical to the 

CAI given by equation [4] . One difference is that in calculating the 
obs 

codon preference statistic the p values (analagous to RSCU in equation [4]) 
are adjusted to take account of base composition. Another difference is that 
the CAI value is scaled to allow for the different amino acid compositions of 
different proteins (see equation [3]), and has a range from 0 - 1.0. 
Although this scaling cannot completely compensate for differing amino 
acid compositions, it facilitates comparisons between genes. 
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Our discussion of the use of the Codon Adaptation Index has focussed on 
unicellular organisms because the determinants of codon usage In 
multicellular organisms are not well understood (1). For example, it appears 
that the mammalian genome comprises regions of quite different G+C content 
(32), and that local G+C content is an important influence on codon usage In 
any one gene (1). Also tRNA abundancies are important selective constraints 
on codon usage, and in multicellular organisms tRNA populations vary among 
tissues. We also note that the only mammalian ribosomal protein genes for 
which DNA sequence data are available (two from mouse and two from rat 
see Ref.33) do not seem to show particularly high synonymous codon bias. It 
may be possible in the near future to derive a reference set of RSGU values 
from other highly expressed mammalian genes, and/or it may prove necessary 
to take into account the tissue in which the gene is expressed, for example 
by having several reference sets. 
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ABSTRACT 

The genetic code is degenerate, but alternative synonymous codons are 
generally not used with equal frequency. Since the pioneering work of 
Grantham's group (1,2) it has been apparent that genes from one species 
often share similarities in codon frequency; under the "genome hypothesis" 
(1,2) there is a species-specific pattern to codon usage. 

However, it has become clear that in most species there are also 
considerable differences among genes (3-7). Multivariate analyses have 
revealed that in each species so far examined there is a single major trend 
in codon usage among genes, usually from highly biased to more nearly even 
usage of synonymous codons. Thus, to represent the codon usage pattern of an 
organism it is not sufficient to sum over all genes (8), as this conceals 
the underlying heterogeneity. Rather, it is necessary to describe the trend 
among genes seen in that species. Ve illustrate these trends for six species 
where codon usage has been examined in detail, by presenting the pooled 
codon usage for the 10% of genes at either end of the major trend (Table 1). 

Closely-related organisms have similar patterns of codon usage, and so 
the six species in Table 1 are representative of wider groups. For example, 
with respect to codon usage, Salmonella typhimurium closely resembles E.coli 
(9), while all mammalian species so far examined (principally mouse, rat and 
cow) largely resemble humans (4,8). 



CAUSES OF WITHIN-SPECIES DIVERSITY 

Biased codon usage may result from a combination of several factors, viz. 
biases in the pattern of mutation, (translat ional ) selection among 
synonymous codons, or selection against particular structures in DNA. 
Within-species heterogeneity in codon usage has been most clearly elucidated 

E* coll ; the major trend is from a strong bias towards a particular subset 
of codons in highly expressed genes to more even codon usage in lowly 
expressed genes (3,4,7). The heavily favoured codons in highly expressed 
E. coli genes are those best recognised by the most abundant tRNA species 
(3,4), and it seems clear that selection mediated by the translation process 
can occur among alternative synonymous codons (10,11). In contrast, most of 
the deviation from equal synonym use in the lowly expressed genes is likely 
to reflect nonrandom patterns of mutation (7,12). Then the pattern of bias 
in A particular gene reflects a mutation-selection balance at a point 
determined by the strength of translational selection on that gene (7,9,12). 

Similar observations have been made for S.cerevisiae (4,5,12,13). In 
B. subtilis (14) and S . pombe (15) there are similar trends among genes, but 
there is less information about tRNA abundances. The pattern of codon 
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Table 1. Codon usage diversity within six species. 

E. coli B. subtilis S. cerevisiae S . pombe Drosophila Human 
high low high low high low high low high low G+C A+T 



Phe UUU 0.34 

UUC 1.66 

Leu UUA 0.06 

UUG 0.07 

Leu CUU 0.13 

cue 0.17 

CUA 0.04 

CUG 5.54 

He AUU 0.48 

AUG 2.51 

AUA 0.01 

Met AUG 1.00 

Val GUU 2.41 

GUC 0.08 

GUA 1.12 

GUG 0.40 



1.33 0.70 1.48 

0.67 1.30 0.52 

1.24 2.71 0.66 

0.87 0.00 1.03 

0.72 2.13 1.24 

0.65 0.00 0.93 

0.31 1.16 0.34 

2.20 0.00 1.80 

1.38 0.91 1.38 

1.12 1.96 1.14 

0.50 0.13 0.48 

1.00 1.00 1.00 

1.09 1.88 0.83 

0.99 0.25 1.49 

0.63 1.38 0.76 

1.29 0.50 0.92 



0.19 1.38 0.44 

1.81 0.62 1.56 

0.49 1.49 0.28 

5.34 1.48 2.16 

0.02 0.73 2.44 

0.00 0.51 1,13 

0.15 0.95 0.00 

0.02 0.84 0.00 

1.26 1.29 1.53 

1.74 0.66 1.47 

0.00 1.05 0.00 

1.00 1.00 1.00 

2.07 1.13 1.61 

1.91 0.76 2.39 

0.00 1.18 0.00 

0.02 0.93 0.00 



1.28 0.12 0.86 

0.72 1.88 1.14 

1.79 0.03 0.62 

0.80 0.69 1.05 

1.55 0,25 0.80 

0.31 0.72 0.90 

0.87 0.06 0.60 

0.68 4.25 2.04 

1.77 0.74 1.27 

0.59 2.26 0.95 

0.64 0.00 0.78 

1,00 1.00 1.00 

2.04 0.56 0.74 

0.65 1.59 0.93 

1.06 0.06 0.53 

0.24 1.79 1.80 



0.27 1.20 UUU 

1.73 0.80 UUC 

0.05 0.99 UUA 

0.31 1.01 UUG 

0.20 1.26 CUU 

1.42 0.80 cue 
0.15 0.57 CUA 
3.88 1.38 CUG 

0.45 1.60 AUU 

2.43 0.76 AUC 
0.12 0.64 AUA 
1.00 1.00 AUG 

0.09 1.32 GUU 
1.03 0.69 GUC 
0.11 0.80 GUA 
2.78 1.19 GUG 



Ser 


UCU 


2. 


81 


0.78 


3. 


45 


0.77 


3.26 


1.56 


3.14 


1. 


33 


0. 


87 


0. 


55 


0. 


45 


1.63 


UCU 




UCC 


2. 


07 


0.60 


0. 


00 


0.81 


2.42 


0.81 


2.57 


0. 


52 


2. 


74 


1. 


41 


2. 


09 


0.60 


UCC 




UCA 


0. 


06 


0.95 


1. 


50 


1.29 


0.08 


1.30 


0.00 


1. 


56 


0. 


04 


0. 


84 


0. 


26 


1.23 


UCA 




UCG 


0. 


00 


1.04 


0. 


00 


0.94 


0.02 


0.66 


0.00 


0. 


67 


1. 


17 


1. 


30 


0. 


68 


0. 13 


UCG 


Pro 


ecu 


0. 


15 


0.75 


2. 


29 


0.99 


0.21 


1.17 


2.00 


1. 


21 


0. 


42 


0. 


43 


0. 


58 


1.50 


ecu 




CCC 


0. 


02 


0.68 


0. 


00 


0.27 


0.02 


0.75 


2.00 


0. 


83 


2. 


73 


1. 


02 


2. 


02 


0.83 


CCC 




CCA 


0. 


42 


1.03 


1. 


14 


1.08 


3.77 


1.38 


0.00 


1. 


51 


0. 


62 


1. 


04 


0. 


36 


1.57 


CCA 




CCG 


3. 


41 


1.54 


0. 


57 


1.66 


0.00 


0.70 


0.00 


0. 


45 


0. 


23 


1. 


51 


1. 


04 


0.10 


CCG 


Thr 


ACU 


1. 


87 


0.76 


2. 


21 


0.39 


1.83 


1.23 


1.89 


1. 


52 


0. 


65 


0. 


70 


0. 


36 


1.45 


ACU 




ACC 


1. 


91 


1.29 


0. 


00 


0.98 


2.15 


0.78 


2. 11 


1. 


04 


3. 


04 


1. 


58 


2. 


37 


0.92 


ACC 




ACA 


0. 


10 


0.68 


1. 


38 


1.64 


0.00 


1.38 


0.00 


1. 


04 


0. 


10 


0. 


77 


0. 


36 


1.45 


ACA 




ACG 


0. 


12 


1.28 


0. 


41 


0.98 


0.01 


0.60 


0.00 


0. 


40 


0. 


21 


0. 


95 


0. 


92 


0.18 


ACG 


Ala 


GCU 


2. 


02 


0.61 


2. 


94 


0.78 


3.09 


1.07 


2.30 


1. 


79 


0. 


95 


0. 


91 


0. 


45 


1.59 


GCU 




GCC 


0. 


18 


1.18 


0. 


08 


1.14 


0.89 


0.76 


1.49 


0. 


50 


2. 


82 


1. 


93 


2. 


38 


0.92 


GCC 




GCA 


1. 


09 


0.79 


0. 


60 


1. 19 


0.03 


1.49 


0.21 


1. 


14 


0. 


09 


0. 


59 


0, 


36 


1.38 


GCA 




GCG 


0. 


71 


1.42 


0. 


38 


0.89 


0.00 


0.68 


0.00 


0. 


57 


0. 


14 


0. 


57 


0. 


82 


0. 11 


GCG 



Relative Synonymous Codon Usage (RSCU; Ref.5) values are presented for two 
groups of genes from each of six species: Escherichia coli . Bacillus 
subtilis , Saccharomyces cerevisiae , Schizosaccharomyces pombe , Drosophila 
melanogaster and Homo sapiens . 

(An RSCU value is the observed number of codons divided by the number 
expected if all codons for that amino acid were used equally. ) 



8208 



Nucleic Acids Research 



Table 1 (cont.) 



E. coli B. subtilis S. cerevlsiae S.pombe Drosophila Human 
high low high low high low high low high low G+C A-i-T 
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CAU 


0. 


45 


1. 


21 


2.00 


1.28 


0. 


32 


1. 


16 


0.56 


1.44 


0.29 


0. 


86 


0. 
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1. 


28 


CAU 
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1. 


55 


0. 


79 


0.00 


0.72 


1. 


68 


0. 
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1.44 


0.56 


1.71 


1. 


14 


1. 
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0. 


72 
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0. 


76 


1.71 


0.88 


1. 


98 


1. 
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1.85 


1.67 


0.03 


0. 
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0. 


21 


0. 


98 
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CAG 


1. 


88 


1. 


24 


0.29 


1.13 


0. 


02 


0. 


90 


0.15 


0.33 


1.97 


1. 


12 


1. 


79 


1. 


02 


CAG 


Asn 


AAU 


0. 


02 


1. 


12 


0.47 


1.21 


0. 


06 


1. 


28 


0.30 


1.41 


0. 13 


1. 


13 


0. 


33 


1. 


20 


AAU 




AAC 


1. 


98 


0. 


88 


1 . 53 


0. 79 


1 . 


94 


0. 


72 


1 . 70 


0. 59 


1 . 87 


0. 


87 


I , 


67 


0. 


80 


AAC 


Lys 


AAA 


1. 


63 


1. 


50 


1.83 


1.47 


0. 


16 


1. 


24 


0. 10 


1.27 


0.06 


0. 


81 


0. 


34 


1. 


17 


AAA 




AAG 


0. 


37 


0. 


50 


0.17 


0.53 


1. 


84 


0. 


76 


1.90 


0.73 


1.94 


1. 


19 


1. 


66 


0. 


83 


AAG 


Asp 


GAU 


0. 


51 


1. 


43 


0.53 


1.16 


0. 


70 


1. 


38 


0.78 


1.56 


0.90 


1. 


10 


0. 


36 


1. 


29 


GAU 




GAC 


1, 


49 


0. 


57 


1.47 


0.84 


1. 


30 


0. 


62 


1. 22 


0. 44 


1 . 10 


0. 


90 


X , 


64 


0. 


71 


GAC 


Glu 


GAA 


1. 


64 


1. 


28 


1.40 


1.27 


1. 


98 


1. 


29 


0.69 


1.20 


0.19 


0. 


73 


0. 


26 


1. 


33 


GAA 




GAG 


0, 


36 


0. 


72 


0.60 


0.73 


0. 


02 


0. 


71 


1.31 


0.80 


1.81 


1. 


27 


1. 


74 


0. 


67 


GAG 


Cys 


UGU 


0, 


60 


0. 


94 


0. 00 


0. 94 


1, 


80 


1. 


10 


0. 14 


1 . 56 


0.07 


0. 


•J I 


0. 


42 




09 


UGU 




UGC 


1. 


40 


1. 


06 


2.00 


1.06 


0. 


20 


0. 


90 


1.86 


0.44 


1.93 


1. 


29 


1. 


58 


0. 


91 


UGC 


ter 


UGA 








































UGA 


Trp 


UGG 


1. 


00 


1. 


00 


1.00 


1.00 


1. 


00 


1.00 


1.00 


1.00 


1.00 


1. 


00 


1. 


00 


1. 


00 


UGG 


Arg 


CGU 


4. 


47 


1. 


71 


3.11 


0.54 


0. 


63 


0. 


64 


5.17 


1.89 


2.65 


0. 


69 


0. 


38 


0. 


64 


CGU 




CGC 


1. 


53 


2. 


41 


1.78 


1.21 


0. 


00 


0. 


39 


0.83 


0.26 


3.07 


1. 


55 


2. 


72 


0. 


36 


CGC 




CGA 


0. 


00 


0. 


52 


0.00 


0.74 


0. 


00 


0. 


65 


0.00 


0.86 


0.07 


1. 


12 


0. 


31 


0. 


81 


CGA 




CGG 


0. 


00 


0. 


80 


0.00 


0.81 


0. 


00 


0. 


34 


0.00 


0.43 


0.00 


1. 


12 


1. 


53 


0. 


51 


CGG 


Ser 


AGU 


0. 


13 


1. 


01 


0.45 


0.56 


0. 


06 


0. 


97 


0.14 


1.48 


0.04 


0. 


89 


0. 


31 


1. 


26 


AGU 




AGC 


0. 


93 


1. 


62 


0.60 


1.63 


0. 


16 


0. 


70 


0.14 


0.44 


1.13 


1. 


01 


2. 


22 


0. 


94 


AGC 


Arg 


AGA 


0. 


00 


0. 


37 


1.11 


2.02 


5. 


37 


2. 


51 


0.00 


1.71 


0.00 


0. 


56 


0. 


22 


2. 


40 


AGA 




AGG 


0. 


00 


0. 


19 


0.00 


0.67 


0. 


00 


1. 


47 


0.00 


0.86 


0.21 


0. 


95 


0. 


84 


1. 


28 


AGG 


Gly 


GGU 


2. 


27 


1. 


29 


1.38 


0.54 


3. 


92 


1. 


32 


3.36 


1.87 


1.34 


0. 


91 


0. 


34 


0. 


84 


GGU 




GGC 


1. 


68 


1.31 


0.97 


1.30 


0. 


06 


0. 


92 


0.59 


0.27 


1.66 


1. 


65 


2. 


32 


0. 


76 


GGC 




GGA 


0, 


00 


0. 


64 


1.66 


1.24 


0. 


00 


1. 


22 


0.05 


1.60 


0.99 


0. 


98 


0. 


29 


1. 


79 


GGA 




GGG 


0. 


04 


0. 


76 


0.00 


0.92 


0. 


02 


0. 


55 


0.00 


0.27 


0.00 


0. 


46 


1. 


05 


0. 


61 


GGG 



For each species, genes have been ranked according to their position along 
the major intraspecif ic trend in codon bias (see text). The highest 107 and 
the lowest lOX of genes have been drawn from: 165 E. coli genes (7), 76 
B. subtilis genes (8,14), 154 S. cerevlsiae genes (5,8), 40 S. pombe genes 
(15), 84 D.melanogaster genes (16) and 290 human genes (8). The sample size 

S. pombe is rather small, but the codon frequencies appear to be 
reliable (15). Full gene listings are available from the authors. 
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frequencies in lowly expressed genes from B. subtilis is most strongly 
Indicative of mutational bias (14). 

Recently, we have reported evidence of selection among synonymous codons 
in the multicellular organism D.melanogaster (16). In contrast, among human 
genes the major variation is in G+C content associated with the local base 
composition around the gene (6). This variation has not been attributed to 
translational selection, and is most easily explained in terms of variation 
in mutation biases among chromosomal regions. 



CODON BIAS RANKINGS 

For E. coli , B. subtilis , S. cerevlsiae and S. pombe codon bias in a gene is 
measured by the Codon Adaptation Index (CAI). A species-specific reference 
set of very highly expressed genes is used to assess the relative fitness of 
each synonymous codon, and the CAI for a gene is then calculated as the 
geometric mean of the fitness values for each codon in that gene. (For a 
full description, see Ref.l7.) 

Since the biological basis of codon frequencies in Drosophila is not yet 
so firmly established (for example, there may be more than one optimal set 
of codons, depending on the tissue of gene expression) we have simply 
estimated codon bias as the deviation from equal synonym use, by a "chi- 
square" scaled by gene length (16); this index is very highly correlated 
with the major trend among genes. Finally, human genes are ranked by G+C 
content at silent positions, since this is the major source of variation 
among genes (4,6). 

FORTRAN 77 programs to calculate these indices are available (on IBM-type 
floppy disks) from the authors on request. 
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A Complex Mutant of TEM-1 |3-Lactamase with Mutations 
Encountered in Both IRT-4 and Extended-Spectrum TEM-15, 
Produced by an Escherichia coli CHnical Isolate 
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R. LABIA,-'' AND J. SIROT' 

Laboratoire de Bacteriologie-Virologie, Faculte de Medecine, 63001 Clemwnt-Ferrand Cedex/ Sennce de 
Bacteriologie et Virologie, Centre Hospitalier Universitaire de Grenoble, 38043 Grenoble Cedex 9, ^ and 
UMR 175 Centre National de la Recherche Scientifique-MNHN, 29000 Quimper,^ France 
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Escherichia coli GR102 was isolated from feces of a leukemic patient. It expressed different levels of 
resistance to amoxicillin or ticarcillin plus clavulanate and to the various cephalosporins tested. The double- 
disk synergy test was weakly positive. Production of a p-iactamase with a pi of 5.6 was transferred to E. coli 
HBloi by conjugation. The nucleotide sequence was determined by direct sequencing of the amplification 
products obtained by PGR performed with TEM gene primers. This enzyme differed from TEM-l {blaT-IB 
gene) by four amino acid substitutions: Met— >Leu-69, GIu^Lys-104, Gly— *^Ser-238 and Asn— >Asp-276. The 
amino acid susbstitutions Leu-69 and Asp-276 are known to be responsible for inhibitor resistance of the lRT-4 
mutant, as are Lys-104 and Ser-238 substitutions for hydrolytic activity of the extended-spectrum p-lactamases 
TEM-15, TEM-4, and TEM-3. These combined mutations led to a mutant enzyme which conferred a level of 
resistance to coamoxiclav (MIC, 64 |Jig/ml) much lower than that conferred by IRT-4 (MIC, 2,048 M-g/ml) but 
higher than that conferred by TEM-15 or TEM-1 (MIC, 16 ftg/ml). In addition, the MIC of ceftazidime for E. 
coli transconjugant GR202 (1 jxg/ml) was lower than that for£. coli TEM-15 (16 (xg/ml) and higher than that 
for E. coli IRT-4 or TEM-1 (0.06 p-g/ml). The MICs observed for this TEM-type enzyme were related to the 
kinetic constants K,„ and ^^,„t and the 50% inhibitory concentration, which were intermediate between those 
observed for IRT-4 and TEM-15. In conclusion, this new type of complex mutant derived from TEM-1 (CMT-1) 
is able to confer resistance at a very low level to inhibitors and at a low level to extended-spectrum cephalo- 
sporins. CMT-1 received the designation TEM-50. 



Overproduction of Escherichia coli chromosomal p-Iacta- 
mase is one cause of resistance to p-lactam-p-lactamase in- 
hibitor combinations such as amoxicillin (AMX)-clavulanate 
(CA) and also results in reduced susceptibility to all p-lactams 
except carbapenems (15). 

In £. coli, resistance to all p-lactams except cephamycins and 
carbapenems may be caused by extended-spectrum p-lacta- 
mases. These enzymes are susceptible to p-lactamase inhibi- 
tors such as CA (10, 14, 15) and are therefore detected by 
synergy tests (10), and strains producing such mutants are 
often susceptible to 3-lactam-p-lactamase inhibitor combina- 
tions. 

In E. coli isolates, the most recently discovered mechanism 
of resistance to AMX-CA is production of inhibitor-resistant 
TEM p-lactamases (IRT) (8). 

E. coli GR102, isolated from feces of a leukemic patient in 
the hematology unit of the teaching hospital of Grenoble, 
France, harbored an unusual p-lactam resistance phenotype 
with resistance to AMX and ticarcillin (TIC) alone and com- 
bined with CA and resistance to all cephalosporins, including 
cephamycins,. at various levels. In addition, the double-disk 
synergy test used for extended-spectrum p-lactamase detection 
was weakly positive. 

This complex phenotype suggested that the p-lactam resis- 
tance of the strain was due to the presence of several p-lacta- 
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mases or a combination of different mechanisms of resistance 
to p-lactams. 

MATERIALS AND METHODS 

Strains. The strains used Included E. coli GRI02, u clinical isolate producing 
a novel p-lactamase; £. coli HBlOl, used as a recipient strain for transfer; E. coli 
HBlOl/pin (TEM-1 producing); E. coli CF0042 (IRT-4-TEM-35 producing) 
(8); and E. coli triinsformant DH5a (CF244), obtained by electroporiition from 
Klebsiella pneumoniae Kp240 (TEM-15 producing) (16). 

Susceptibility to p-Iactams. The MICs of AMX, TIC, cephalothin (CF), ce- 
fotaxime (CTX), ceftazidime (CAZ), aztreonam (ATM), cefepime (FEP), and 
cefpirome (CPO) alone and combined with CA iit a fixed concentration of 2 
p,g/ml were determined. A method of dilution with Mueller-Hinton agar (Sanofi 
Diagnostics Pasteur, Marnes-la-Coquette, France) and an inoculum of Id** CFU 
per spot were used. Antibiotics were provided as powders by SmithKline 
Beecham Pharmaceuticals (AMX, TIC, and CA), Roussel-Uclaf {CVX and 
CPO), Glaxo Wellcome Research and Development (CAZ), and Bristol-Myers- 
Squibb (ATM and FEP). 

Detection of extended-spectrum p-lactamasc was performed with the double- 
disk synergy test as described by Jarlier et al. (10). 

Isoelectric Tocusing. Isoelectric focusing was performed with polyacrylamidc 
gels containing ampholines with a pH range of 3.5 to 10.0 as previously described 
(19), and 3-lactamases with known pis (TEM- 1 (pi 5.4],TEM-2 [pi 5.61, TEM-15 
[pi 6.0], and IRT-4 [pi 5.2]) were used as standards. 

Transfer experiment. A transfer experiment was performed with E. coli 
GRI02 and the recipient E. coli HBIOI. Transconjugants were selected on agar 
containing rifampin (300 M-g/ml) and gentamicin (8 jig/ml) or CAZ (0.5 M-g/m')- 

Sequencing of DNA amplified by l*CR. On the assumption that the transcon- 
jugant strain contained Wotem- single-stranded DNA template was generated 
for sequencing by PCR performed with an asymmetric ratio of amplification 
primers A and B, and the nucleotide sequence was determined as previously 
described (3), by direct sequencing of the amplified product obtained from the 
transconjugant E. coli GR202. 

Determinat ion of p-lactamase kinetic parameters A^-up and tc^,f,JK^, Af- 
finity {K,„) and catalytic activity (/c^^^,) were determined with highly purified 
extracts (2:97% pure) by using a computerized microacidimetric method (13). 
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TABLE L Nucleotide and amino acid substitutions in hliiy^^ genes 



IN uclcottoc 




Nucleotide (amino 


acid)'' in: 




^WEM-l(Tn2) 


WflTEM-15 




WrtcMT-l 


226 


T (Phe) 


C 


T 


T 


317 


C (Gln-39) 


c 


C 


C 


346 


A (GIu) 


A 


G 


A 


407 


A (Met-69) 


A 


C (Uu) 


C (Leu) 


436 


T (Gly) 


C 


T 


T 


512 


G (Glu-104) 


A (Lys) 


G 


A (Lys) 


604 


T (Ala) 


G 


G 


T 


682 


T (Thr) 


T 


T 


T 


914 


G (GIy-238) 


A (Ser) 


G 


A (Ser) 


925 


G (Gly) 


G 


G 


G 


1022 


A (Asn-276) 


A 


G (Asp) 


G (Asp) 



" Nucleotide numbering is according to Sutcliffe (21). 
The amino acid is indicated when a point mutation leads to an amino acid 
substitution compared with the sequences of TEM-1 (Tn2). Numbering is accord- 
ing to Ambler et al. (1). 



All p-lactamases were purified from crude extracts by size exclusion chromatog- 
raphy on Sephadex G-lOO (Pharmacia), preparative Isoelectric focusing, and 
reverse-phase high-performance liquid chromatography on a C|h Nucleosil 500A 
column (Interchim) as described by Brun et al. (2). The homogeneity of the 
preparations was determined by analytical sodium dodecyl sulfate-polyacrylam- 
ide gel electrophoresis. The k^,^^, K„„ and k^.JK„, values of enzyme CMT-I were 
compared with those of the p-lactamiises TEM-I, TEM-1 5, and lRT-4. The 
kinetics of TEM-1 and its mutants toward penicillins and cephalosporins were 
compared. Inhibition studies of TEM-1 and the mutant enzymes with CA, sul- 
bactam, and tazobactam were performed. The affinity of the enzyme for the 
inhibitor, expressed as the inhibition constant (/C,), was measured by using com- 
petition procedures with benzylpenicillin. It Is determined from the extrapolated 
rate at the time when the inhibitor is added. The 50% inhibitory concentration 
([C5,,) was determined after incubation of the inhibitor and the enzyme for 10 
min (completed inactivation) at 37''C before measurement of the remaining 
enzymatic activity. The ICj,,, is defined as the inhibitor concentration causing 
50% inhibition of benzylpenicillin hydrolysis by the enzyme. 

RESULTS 

Resistance phenotype of E. coli GR102. E. coli GR102 ex- 
pressed a complex p-Iactam resistance phenotype with resis- 
tance to AMX and TIC alone and combined with CA and 
resistance to cephalosporins at various levels: high-level resis- 
tance to narrow-spectrum cephalosporins and low-level resis- 
tance to extended-spectrum cephalosporins (MICs, 1 to 32 
|xg/ml). A positive synergy test with CA suggested the presence 
of a mutant extended-spectrum p-lactamase of class A origin. 
In addition, this strain had reduced susceptibility to cefoxitin 
(MIC, 128 |xg/ml) and, to a lesser extent, cefotetan and moxa- 



lactam (data not shown). This reduced susceptibility to cepha- 
mycins was probably related to decreased permeability of the 
strain for p-lactams, since E. coli GRI02 had lost an outer 
membrane protein with a molecular mass of 40 kDa (data not 
shown). Overproduction of the chromosomal cephalospori- 
nase was not detected. 

This strain was also resistant to aminoglycosides (tobramy- 
cin, genlamicin, and netilmicin), probably owing to production 
of an AAC (3)-II enzyme, and to chloramphenicol, tetracy- 
clines, and sulfonamides. 

Conjugative transfer and isoelectric focusing. The gene en- 
coding resistance to |3-lactams, except cephamycins, was tran- 
ferred by conjugation from E. coli GR102 to rifampin-resistant 
E, coli HBlOl (GR202). Selection for CAZ or gentamicin re- 
sistance revealed the transfer of an 85-kb plasmid conferring 
resistance to p-lactams, aminoglycosides (tobramycin, genta- 
micin, and netilmicin), tetracyclines, and sulfonamides. 

By isoelectric focusing, two bands at pis 5.4 and 5.6 were 
observed in E. coli GR102 and one band at pi 5.6 was observed 
in £. coli transconjugant GR202. 

Nucleotide sequencing. As shown in Table 1, from the trans- 
conjugant GR202 producing a p-lactamase with a pi of 5.6, 
nucleotide sequencing revealed a Wtz-n^M gene identical to the 
bloT-IB gene (Tn-2) at positions 226, 317, 346, 436, 604, 682, 
and 925, which discriminate the bla-y^^ genes (4, 7). 

The hla-y^^ gene from the £. coli transconjugant differed 
from the blaT-IB gene by four point mutations. These muta- 
tions consisted of the nucleotide change A-*C at position 407, 
which leads to the amino acid substitution Met-^Leu at posi- 
tion 69 (I); the nucleotide change G-^A at positions 512 and 
914, leading to the amino acid substitutions Glu— >Lys at posi- 
tion 104 and Gly->Ser at position 238; and the nucleotide 
change A->G at position 1022, leading to the amino acid 
substitution Asn— >Asp al position 276. The two amino acid 
substitutions at positions 69 and 276 are observed in the IRT- 
4-TEM-35 enzyme (2, 8, 22), and the two amino acid substi- 
tutions at positions 104 and 238 are observed in the extended- 
spectrum p-Iactamase TEM-15 (16). 

p-Lactam MICs for TEM mutants. Consequently, we com- 
pared the MICs of p-lactams for £. coli GRI02 and its trans- 
conjugant GR202, producing a complex mutant form of TEM 
(CMT-1), with MICs for IRT-4-producing £. coli CF0042, 
TEM-J5-producing E. coli CF244, and TEM-1 -producing E. 
coli HBIOI (Table 2). 

For E. coli GR202 producing CMT-I, the MICs of AMX- 
CA (64 |xg/ml) and TIC-CA (64 |xg/ml) were much lower than 



TABLE 2. MICs of p-lactams for E. coli CMT-1 (GR102 and its transconjugant, GR202), E. coli TEM-15 (CF244), 
E. coli IRT-4 (CF0042), and E. coli TEM-1 (HBlOl) 



MIC (^ig/ml) of: 



E. coli strain AMX TIC CF CTX CAZ ATM FEP CPO 



With ^, With ^, With ^, With With With With 

Alone p . „ Alone p . Alone p . Alone p . Alone „ . Alone p . Alone p . Alone 



GR102 (CMT-1) 
GR202'^ (CMT-1) 


4,096 


256 


>4,096 


512 


256 


128 


4 


0.5 


4 


1 


1 


0.25 


8 


1 


32 


4 


2,048 


64 


4,096 


64 


8 


4 


1 


0.03 


1 


0.12 


0.12 


0.06 


1 


0.03 


2 


0.12 


CF244'" (TEM-15) 


>4,096 


16 


>4,096 


32 


128 


4 


8 


0.06 


16 


0.25 


4 


0.12 


1 


0.03 


2 


0.03 


CF0042 (lRT-4) 


4,096 


2,048 


1,024 


512 


8 


4 


<0.06 


<0.06 


<0.06 


<0.06 


<0.06 


<0.06 


£0.06 


:£0.06 


<0.06 


:S0.06 


HBlOl (TEM-1) 


4,096 


16 


4,096 


32 


8 


2 


<0.06 


<0.06 


<0.06 


<0.06 


<0.06 


^0.06 


<0,06 


<0.06 


<0.06 


£0.06 


HBlOl'^ 


4 


4 


1 


1 


4 


4 


<0.06 


<0.06 


<0.06 


<0.06 


<0.06 


<0.06 


<0.06 


<0.06 


<0.06 


<0.06 



" CA was used at 2 jjig/ml, 

^ E. coli HBlOl transconjugant. 

colt DH5a transformant. 

coli recipient strain. 



1324 SI ROT ET AL. 



TABLE 3. Production of TEM-type (i-lactamases in E. coli 



Enzyme 


pl 


Producing 
organism 


Activity in 
crude extnict" 


Sp act'' of 
purified protein 


TEM-1 


5.4 


HBlOl 


2.2 


2.48 


TEM-35-1RT-4 


5.2 


CF0U42 


2.1 


2.17 


TEM-15 


6.0 


CF244 


0.4 


0.083 


TEM-5()-CMT-l 


5.6 


GR202 


0.4 


0.23 



" Micromoles of benzylpenicillin per minute per milligram of protein. 
Micromolcs of benzylpenicillin per minute per microgram of protein. Deter- 
mined with highly purified preparations (>97% pure). 



ANTIMICROB. AGKNTS CuEMOTHliR. 



TABLE 5. Inhibition of TEM-1 and its mutant forms by CA, 
sulbactam, and tazobactam 



IHnzyme 




lC,„(^lM),^:,{^.M) 




CA 


Sulbactam 


Tazobactam 


TEM-1 


0.08, 0.1 


6.1, 0.9 


0.1, 0.01 


TEM-35-IRT-4 


28, 27 


304, 49 


1.8, 0.6 


TEM-15 


0.01, 0.02 


0.03, 0.02 


0.01, 0.008 


TEM-50-CMT-1 


0.25, 0.7 


0.5, 0.4 


0.04, 0.06 



those observed for lRT-4-producing strain CF0042 (2,048 and 
512 |xg/ml, respectively). Similarly, for strain GR202, MlCs of 
CTX (1 |xg/ml) and CAZ (1 fxg/ml) were lower than those 
observed for TEM-15-producing strain CF244 (8 and 16 |xg/ml, 
respectively) and higher than those for IRT-4 producing strain 
CF0042 (<0.06 |xg/ml). The same 1:8 ratio of MlCs was ob- 
served for aztreonam (0.12 p-g/ml for the CMT-1 producer and 
4 fjig/ml for the TEIVI-15 producer). The MlCs of cefepime and 
cefpirome (1 and 2 M-g/ml) were identical for the CMT-1 and 
TEM-15 producers. 

E. coli GR202 (CMT-1) was 2 to 4 times less susceptible to 
AMX or TIC plus CA (64 fxg/ml) and 16 times less susceptible 
to CTX, CAZ, FEP, and CPO (1 to 2 fxg/ml) than was E. coli 
HBlOl (TEM-1 '^). MlCs of p-lactam substrates in the pres- 
ence of 2 and 4 img of sulbactam or tazobactam per ml were 
about fourfold lower for the CMT-1 -producing strain than for 
the TEM-1 -producing strain (data not shown). 

Enzymatic and kinetic parameters of p-lactamascs. Enzy- 
matic and kinetic parameters of the new complex mutant en- 
zyme CMT-1 with regard to penicillins and cephalosporins 
were compared with those of the TEM-1, IRT-4, and TEM-15 
p-lactamases (Tables 3 to 5). The specific activity of the highly 
purified CMT-1 protein was 10-fold lower than that of TEM-1 
(Table 3). 

For all penicillins, the /Ccat values of CMT-1 were about 
10-fold lower than those of TEM-1 and IRT-4 and about twice 
as high as those of TEM-15. The catalytic efficiencies (KJK„,) 
of the three mutant enzymes were lower than those of the 



TABLE 4. Comparison of the kinetics" of TEM-1 
and its mutant forms 



Drug 



TEM-1 



TEM-351- 
IRT-4 



TEM-15 



TEM-50- 
CMT-l 



Benzylpeni- 


1,200, 25, 48.0 


1,050, 140, 7.5 


cillin 






AMX 


920, 26, 35.4 


900, 245, 8.5 


TIC 


115, 10, 11.5 


125,320, 0.4 


Carbenicillin 


132, 13, 10.2 


120, 360, 0.3 


Piperacillin 


987, 45,21.9 


945, 320, 2.9 


CP 


122, 250, 0.5 


52, 1,200, 0.04 


Cephaloridine 


2,045, 800, 2.5 


340, 1,420, 0.2 


Cefoperazone 


470, 260, 1.8 


305, 1,325, 0.2 


Cefuroxime 


Niy 


ND, 


Ceftriaxone 


ND 


ND, 


CTX 


1.2, ND 


ND, 


CAZ 


ND 


ND, 


ATM 


ND 


ND, 



26, 5, 5.2 
8, 2, 4.0 
7, 3, 2.3 
64, 12, 5.3 
43, 23, 1.7 
37, 30, 1.2 
25, 22, 1.1 
24, 91, 0.3 
92, 50, 1.8 
180, 100, 

1.8 
7, 80, 0.1 
0.2, ND 



70, 33, 2.1 
15, 30, 0.5 
13, 60, 0.2 
111, 31, 3.6 
62, 324, 0.2 
320, 310, 1.03 
150, 118, 1.3 
20, 260, 0.08 
35, 385, 0.09 
150, 873, 0.2 

3, ND 
1, ND 



" The standard deviation for analysis was ^\0%. 

^ ND, not detected; the rate was too small to determine k^. 



and K„, reliably. 



TEM-1 enzyme, and the values observed for CMT-1 and IRT-4 
with carboxy- and ureidopenicillins were similar. 

For cephalosporins, /c^.„ values of CMT-1 were slightly lower 
than or similar to those of TEM-15 for ceftriaxone, CTX, 
CAZ, and ATM; however, the catalytic efficiencies of the 
CMT-1 enzyme were only 5 to 11% of those of TEM-15 with 
ceftriaxone and CTX. No activity of TEM-1 or IRT-4 against 
expanded-spectrum cephalosporins and ATM (k^.^yS of <1 s~' 
associated with /C,s of >500 |jlM) was detected. 

The IC50 of CA for CMT-1 (Table 5) was higher (0.25 |jlM) 
than that'for TEM-1 (0.08 m-M) and TEM-15 (0.01 fxM) but 
100-fold lower than that for IRT-4 (28 |xM), Sulbactam was the 
least efficient inhibitor of IRT-4 (IC^^, 304 |jiM), while its 
inhibitor efficiency was similar to that of CA for CMT-1 (IC5(„ 
0.5 |jiM). Tazobactam was the most efficient inhibitor of all of 
these p-lactamases. Moreover, CMT-1 (IC^,,, 0.04 fxM) and 
TEM-15 (IC,;^,, 0.01 fxM) were more susceptible to inhibition 
by tazobactam than was TEM-1 (IC*;,,, 0.1 \xM). 

DISCUSSION 

The TEM-1 derivative described in this report constitutes a 
new type of complex mutant, CMT-1, combining mutations 
responsible for inhibitor resistance (Leu-69 and Asp-276) and 
those responsible for extended-spectrum activity (Lys-104 and 
Ser-238). It is the first example of such a p-lactamase produced 
by a clinical isolate of E. coli. 

Mutations conferring resistance to p-lactam inhibitors. Re- 
placement of methionine 69, just adjacent to serine 70, by 
aliphatic amino acids such as leucine influences the positioning 
of residues (5, 17) because the buried side chain at position 69 
lies behind p-strand B3, forming the back wall of the oxyanion 
pocket in which the p-lactam's carbonyl group is polarized 
(12). Moreover, crystallographic data indicate that residues in 
the C-terminal a helix, such as Asn-276, restrict the mobility of 
the Arg-244 side chain and so play a role in maintaining the 
integrity of the active site (11). Because small p-lactams such 
as CA must rely primarily on attractive interactions with the 
oxyanion hole and Arg-244, inhibitor resistance exists in the 
natural variant IRT-4, containing changes at residues 69 and 
276 (12). This IRT-4 mutant enzyme is one of the most resis- 
tant to inhibition by CA among the IRT-type enzymes (2, 22), 
with a CA IC50 350-fold higher than that for TEM-1. The CA 
resistance of this mutant is confirmed by high-level resistance 
to combinations of AMX and TIC with CA (MIC, 2,048 and 
512 |xg/ml, respectively). Inhibition studies showed that the 
CMT-1 enzyme was 100-fold less resistant than the IRT-4 
mutant but only 3 times as resistant to inhibition by CA as the 
wild-type TEM-1 p-lactamase. These kinetic results were close- 
ly related to the moderate resistance level (64 p-g/ml) of the 
CMT-1 -producing strain to AMX-CA or TIC-CA. The inhib- 
itor resistance usually caused by the Leu-69 mutation may be 
decreased by the close proximity of the Ser-238 mutation (see 
below). 
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Mutations conferring extended-spectrum activity. The GIu— > 
Lys change at position 104 contributes to the precise position- 
ing of residues 130 to 132 (SDN loop), which are involved in 
substrate binding, but seems insufticient alone to confer true 
resistance to expanded-spectrum cephalosporins (18, 20). 

It is generally recognized that the substitution Gly->Ser-238 
enlarges the active site, thereby creating an enzyme with in- 
creased affmity for the 7-oxyimino cephalosporins (6). All of 
the TEM variants reported to have Ser-238 contain methionine 
at position 69, and mutant enzyme CMT-1 is the first harboring 
both mutations Ser-238 and Leu-69. The affinity may be af- 
fected by a change at position 69, since the side chain at 
position 238, on the inner side of the B3 p-strand, lies very 
close to the side chain of residue 69 (12). 

The TEM-1 variant, with the associated changes Glu->Lys- 
104 and Gly^Ser-238, is TEM-1 5 (16). Complex mutant CMT-1, 
with these last mutations, conferred a lower level of resistance 
to CTX and CAZ than did TEM-15, and this difference cor- 
related with the kinetic constants. The kinetic comparison of 
substrate hydrolysis in extended-spectrum p-lactamases TEM- 
15 and CMT-1 revealed that the catalytic efficiency {k^JK„^) of 
CMT-1 was lower than that of TEM-15 for CTX (10-fold) and 
ceftriaxone (20-fold). 

Overall, the hydrolytic properties of this complex TEM mu- 
tant enzyme were found to be closer to those of an extended- 
spectrum enzyme than to those of an inhibitor-resistant en- 
zyme. However, the predominant effect of the mutations Lys- 
104 and Ser-238, which are responsible for extended-spectrum 
activity and inhibitor hypersusceptibility, was clearly attenu- 
ated by the mutations Leu-69 and Asp-276. In the strain pro- 
ducing CMT-1, complete reversal of CA resistance by muta- 
tions enhancing activity against 7-oxyimino cephalosporins was 
not observed as reported in a Ser-164-Ser-244 mutant ob- 
tained by site-specific mutagenesis (9). 

E. coli GR102 was isolated from a patient treated with CPO 
(4 g/day) and amikacin (1 g/day) for 12 days. An E. coli strain 
with a typical p-lactam inhibitor resistance phenotype and the 
same resistances to other antibiotics had been isolated from 
the same sample (feces) from this patient a week before. This 
suggests that an inhibitor-resistant TEM mutant enzyme with 
the Leu-69 and Asp-276 mutations emerged first, and then, 
under antibiotic (CPO) and mutagenic agent (cytarabin and 
daunorubicin) pressure, this mutant underwent the two addi- 
tional mutations, Lys-104 and Ser-238, responsible for extend- 
ed-spectrum activity. Unfortunately, this hypothesis could not 
be confirmed since the initial £. coli IRT-producing strain was 
not kept. 

In conclusion, the production of this complex TEM mutant 
cannot alone account for the high-level multiresistance to 
p-lactams of the £. coli GR102 isolate, in which several resis- 
tance mechanisms were involved (TEM-1 and, probably, de- 
creased permeability). It would probably be more beneficial for 
an E. coli strain to produce two different TEM mutants (an 
extended-spectrum mutant and an inhibitor-resistant mutant) 
simultaneously than to produce one double mutant. If each 
mutant conferred its own resistance phenotype, high-level re- 
sistance to both CA combinations and extended-spectrum 
cephalosporins could then be expected. 
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p-Lactamases inactivate penicillin and cephalosporin antibiotics by hydrolysis of the p-lactam ring and are 
an important mechanism of resistance for many bacterial pathogens. Four wild-type variants of Staphylococcus 
aureus p-lactamase, designated A, C, and D, have been identified. Although distinguishable kinetically, they 
differ in primary structure by only a few amino acids. Using the reported sequences of the A, C, and D enzymes 
along with crystallographic data about the structure of the type A enzyme to identify amino acid differences 
located close to the active site, we hypothesized that these differences might explain the kinetic heterogeneity 
of the wild-type p-lactamases. To test this hypothesis, genes encoding the type A, C, and D p-lactamases were 
modified by site-directed mutagenesis, yielding mutant enzymes with single amino acid substitutions. The 
substitution of asparagine for serine at residue 216 of type A p-lactamase resulted in a kinetic profile 
indistinguishable from that of type C p-lactamase, whereas the substitution of serine for asparagine at the 
same site in the type C enzyme produced a kinetic type A mutant. Similar bidirectional substitutions identified 
the threonine-to-alanine difference at residue 128 as being responsible for the kinetic differences between the 
type A and D enzymes. Neither residue 216 nor 128 has previously been shown to be kinetically important 
among serine-active-site p-lactamases. 



p-Lactam antibiotics, including the penicillins and cephalo- 
sporins, are important agents in the therapy of bacterial infec- 
tions. However, in some clinical settings the usefulness of these 
agents has been diminished by the emergence and spread of 
bacterial strains that produce p-lactamase, which hydrolyzcs 
the p-lactam ring and inactivates the drug's antimicrobial effect 
(27). This problem has been demonstrated most dramatically 
with Staphylococcus aureus. Whereas the vast majority of clin- 
ical isolates otS. aureus were highly susceptible to penicillin G 
al the time of its introduction into clinical use in the early 
194()s, the spread of p-lactamase-producing, penicillin-resis- 
tant strains was so widespread by the late 194()s that penicillin 
G was no longer a reliable antistaphylococcal agent. Most 
clinical isolates of 5. aureus produce p-lactamase (36). 

Four types of 5. aureus p-lactamase have been identified by 
serologic (34, 35) and kinetic (19, 20) methods. These variants 
originally were designated types A, B, C, and D by Richmond 
(34) and Rosdahl (35). This nomenclature should not be con- 
fused with that of the dill'erent classes of p-laclamases, A 
through D, that has been used more recently to group the 
p-lactamases of all bacterial species on the basis of active site 
(serine versus zinc), size, and kinetic characteristics (4). Each 
of the four recognized types of S. aureus p-lactamase (A, B, C, 
and D) is a class A p-Iactamase with a serine active site. The 
mature form of the enzyme has a molecular mass of 30 kDa, 
contains 257 amino acids, and is excreted extracellularly (1). 

The type A and C staphylococcal p-lactamases are easily 
distinguished kinetically, either by substrate profile (18) or by 
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/C„, and catalytic constant {k^..^^) determinations (43). Whereas 
the kinetics of hydrolysis exhibited by the type A and D staph- 
ylococcal p-lactamases are similar (20), they can be rapidly and 
reproducibly distinguished by a fivefold difference in the ratio 
of the rates of the initial velocity of hydrolysis of penicillin G 
and cefazolin (43). The lower penicillin G/cefazolin hydrolysis 
ratio of the type D p-Iactamase appears to be related to a lower 
/Cc:,t for the hydrolysis of penicillin G by purified type D enzyme 
than the other staphylococcal p-lactamases (43). 

The genes encoding the type A, C, and D p-lactamases of 5. 
aureus are located on plasmids and have been cloned and 
sequenced (5, 8, 9, 40). The deduced amino acid sequences 
identify six amino acid differences in the primary sequence of 
the prototypic type A p-lactamase from strains PCI and SI 
compared with the type C sequence from strain 3804, including 
amino acids 1 16, 202, 205, 206, 212, and 216. In addition, there 
are five differences, at amino acids 93, 121, 128, 226, and 229, 
between the prototypic type A p-lactamase and the type D 
P-lactamase produced by strain FAR4. The type C and D 
enzymes differ by 11 amino acids. 

The molecular basis for the kinetic heterogeneity exhibited 
by the staphylococcal p-lactamases has not been determined. 
However, the structure of the type A variant of S. aureus 
p-lactamase is known from X-ray crystallographic analysis (11, 
12), and several of the amino acids which are different between 
the type A, C, and D p-lactamases are located close to the 
active-site cleft. To test the hypothesis that single amino acid 
differences at sites close to the active site are responsible for 
the kinetic heterogeneity exhibited by naturally occurring vari- 
ants of 5. aureus p-lactamase, we constructed mutant p-lacta- 
mases with single amino acid substitutions at sites where the 
wild-type enzymes differ and evaluated the kinetics of the mu- 
tant enzyme. 



7248 



Voi. 178, 1996 AMINO ACIDS 128 AND 216 OF S. AUREUS p-LACTAMASE 7249 



TABLE 1. Plasmids and phagemids used in this study 



Pliismid 


I losl 


Resist;) ncc" 


Description 


Source or 
rcl'crcncc 




i3. (lumiis 


Amp 




18 




S. iiitwus 


Amp 


Fr("tm ylr'iin "^KfM currip*; hl/i7 HnrnHi'nti Ivnp ft-l;i('1 Mmiwf 

1 IvJIII 911(1111 .JOLrH, ^Clllll^d L/lUtj ^Il^UVJIlIf^ iyiJ\t V/ fi lil^ ICI 1 1 1 tlAV' 


9 


nl iRini 

pUDlUl 


S. (lufaus 


Amp 


T^rr»m ctni'n Pirripc hl/i7 pnpnHino tvnp ITi ft-lnrtnmriQP 

l^IUlll ollcllll irn.I\Hy CUI I 1C9 Ulll£j CllvULIJItK l-J t ^'Hlk, Lt.ll 1 IclS^ 


25 




S. QIU'CUS 


Em 


T-rnm ctni'n RM^d49 pirrip^ ofnp pnrnflinti Pm' 

1 lUllI Sllt-llll IXl^^^'T^, Cullies ^CIIC CllLUUJIlti l_.lll 


13 


pBC SK+ 


E. coli 




Phagemid vector 


QtrotiopriP 
k3l 1 il lilgv^llC 


p I ZloR 


b. coll 


Amp 


Phagemid vector 


Di(.)-i\au 


pVKion 


S. aureus-E. coli 


Em, Cm 


n,. coil .J, Hi/rcUii Miuiuc piahniiu uoiiMruLicu uy LKiiiiiig .'.^"Mj jjciI/h 


This ctiiHv 
I 1 im At ULi y 






on a ^aci sue mio ^aii or ptSL^ oNt^ 




pVKini 


S. aurfiits-h. coli 


iLm, v_,m 


pviviwu Willi ci *t-KU iii.cfi\i-//r/(Uiii ird^iiiLiu Cririyjiijj p-i(iLi(niui>L 


nriiic ctiirtv 

1 1 im t>i 11 uy 






regulatory genes irom pI3796 but lacks bUiZ 




pVKlU2 


h. colt 


Amp 


pi^ioix Willi *i-KU r/{/iLii irdgiiieiu wiiii lypc r\ uuic^ iruiii poi 


1 Ills dl llUy 


pVK103 


E. coli-S. aureus 


Amp, Em 


pVKlOl with 1.1-kb //mdlll fragment with type A blaZ from pVK102 


This study 


pVK104 


E. coli 


Amp 


pBC SK+ with 8-kb EcoKi fragment with type C blaZ from pII3804 


This study 


pVK105 


E. coli-S. aureus 


Amp, Em 


pVKlOl with L8-kb ////idlll fragment with type C olaZ irom pVKlU4 


This study 


pVK106 


E. coli-S. aureus 


Amp, Em 


pVKlOl with 1.3-kb f/mdIII fragment with type D blaZ from pUBIOl 


This study 


pVK!07 


E. coli-S. aureus 


Amp, Em 


pVKlOl with 1.1-kb NmdIII type A blaZ fragment except a single 


This study 






nucleotide substitution leading to S216N change 




pVKlU8 


E. coli-S. aureus 


Amp, Em 


pVKlOl with 1.1-kb /7/»dIII type A blaZ fragment except a single 


This study 






nucleotide substitution leading lo T128A change 




pVK109 


E. coli-S. aureus 


Amp, Em 


pVKlOl with 1.8-kb ////idlll type C blaZ fragment except a single 


This study 






nucleotide substitution leading to N216S change 




pVKUO 


E. coli-S. aureus 


Amp, Em 


pVKlOl with 1.3-kb //mdlll type D blaZ fragment except a single 


This study 






nucleotide substitution leading to A128T change 




" Amp, ampicil 


lin; Cm, chlorumphcnicol; Em, crylhromycin. 







MATERIAI^S AND METHODS 

Chemicals and media. Standard powders of nilrocclin (BBL Microhiology 
Systems, Cockcysvillc. Md.); ccphaloridine (Sigma Chemical Co., St. Louis. 
Mo.); mcthicillin, ampiciltin, and ccphapirin (Bri.slol Uiboratorics, Syracuse, 
N.Y.); and ccfazolin and penicillin G (Eli Lilly & Co., Indianapolis, Ind.) were 
used to prepare antibiotic solutions for kinetic studies. Cation-exchange rosin 
Pll (Whatman Laboratories. Kent, England) and /7Jt*m-aminophenyl boronic 
acid hemisuiratc and succinamide-activated sepharosc (Sigma) were used for 
preparing columns for p- 1 acta muse purification. Restriction enilonuclca.scs, T4 
DNA ligasc, and Sequenase were purchased from United Stales Biochemical 
Corp., Cleveland, Ohio. A Muta Gene Phagemid kit (Uio-Rad Lii bora lories, 
Fullcrton, CaliT.) was u.sed for silc-dircclcd mutagenesis. The oligimueleolidcs 
required ior silc-di reeled mutagenesis and sequencing of I he p-laclamase gene 
were synthesized on a Cyclone Plus Automatic DNA synthesizer (Millipore, 
Bedford, Mass.) by the DNA Core Facility, Department of Molecular Physiology 
and Biophysics, Vanderbilt University. Modified 1% CY brolh was prepared as 
described by Novick (29). LB and 2x YT media were prepared according lo the 
methods of Maniatis et al. (26). Tryptic soy brolh was purchased from I3BL. 

Bacterial strains, plasmids, and eultivatiun. The plasmids and phagemids 
used and constructed during this sludy arc li.sled in Table 1. S. aureus RN422() 
(R. R Novick, Public Health Research Inslilule, New York, N.Y.) was used as a 
recipient for protoplast transformation. Escherichia coli DM5 a was used for the 
transformation and propagation of E. coli plasmids and S. aurcits-E. coli shuttle 
plasmids except when indicated otherwise. 

Construction of S. aureus p>lactamase expression vector. An coU-S, aureus 
shuttle vector, pVKlOl, was constructed with erythromycin as a .selectable 
mar Iter in .V. aureus, by using a strategy similar lo that employed by East el al. (8) 
lo ccmstnict S. aurcius p-lactamasc expression vecu>r pAETOft. First S. aureus 
plasmid pE194 was digested with Sal\ and cloned into the Sac\ site of pBC SK+ 
(Slralagene, La Jolla, Calif.) lo create pVKlOO. Next hlaU (3-lactama.se regula- 
tory gene) t'n>m S. aurcius plasmid pI37% was cloned on a 4-kb //widlll-/:wl^l 
fragment into pVKlOO lo produce pVKKH. 

Cloning of type A, C, and D blaZ. Large-scale isolation of plasmid DNA f rom 
.V. aureus by ultracenlrifugation in a cesium chloride gradient (Var lac oid 
Chemical Co., Inc., Bergcnfield. N.J.) was performed by methods described by 
Galelto el al. (10). A 4-kb ////icll fragment carrying Ibe blaZ (p-laclamase 
stnictural gene) from pSI was cloned into pTZlSR to create pVKI(t2. ////id III 
digestion of pVKin2 enabled the type A hlaZ lo be mobilized on a l.l-kh 
fragment which was cloned into pVKlOl. This produced pVKRO in which blaZ 
and blaR are transcribed divergently from the same inlracistronic region. A 
similar strategy was used to clone the type C bloZ from pII3K()4 into pVKIOl via 
a pBC SK-»- intermediate (pVKKW) to produce pVK105. The plasmid DNA 
from pUBlOl was digested with //mdlll, and a 1.3-kb fragment carrying type D 
bhtZ was cloned in tlic desired orieiilation into pVKKJl lo pn)ducc pVKKIfi. 

Site-directed mutagenesis of p-lactamasc. Oligonucleolide-direcled mutagen- 
esis was performed by the method of Kunkel el al. (23, 24). //mdlll fragments 



containing the type A, C, and D blaZ genes were cloned individually into the 
phage veclor MHmplS (42). Two o I igo nucleotides, each with a single mismatch 
(underlined), were used lo introduce specific inulalions al amino acids 12S 
(S'TCACTATA rCiCCArrGAACiCC?': Tlir lo Ala) and 2U) (5'AGTATCTC 
CGTTTTTATTATT3'; Ser lo Asn) of the type A blaZ. Additional primers were 
constructed lo produce the reverse mutation in the type C and D blnZ genes (i.e., 
.S'AGTGTCTCCGCTTTTATTATT, Asn to Ser, type C blaZ and 5'ATCACT 
ATATGICATTGAAGC, Ala loThr. type D/;/«Z). Single-stranded DNA from 
Ihe six randomly picked plaques was sequenced by the dideoxy chain termination 
method (38) using Sequenase enzyme lo conlirm the desired mutation. The 
complete open reading frame was .sequenced in order to rule out the presence of 
any spurious mutations. 

Expression of wild-type and mutant p-laclamascs in S. aureus RN4220. Wild- 
type blaZ encoding type A, C, and D p-laclamases and mutant hlaZ genes from 
M13mpIH recombinants were cloned as 1.1- to l.S-kb //mdlll fragmenls into ihe 
IlindXW site of .V. aureus-E. coli shuttle plasmid pVKlOl. Recombinants were 
selecled on LB medium containing ampicillin (HK) ^-g/ml) and chloramphenicol 
(.^0 M-g/ml). The desired oricntatitm of the in.sert was verified either by reslriction 
with EcoRV, which has a single site within type A and D bUiZ and one just 
upstream of the bluZ pronioler within hIaR, or by HCR. I'rotoplast Iransforma- 
lion was used to transfer the recombinant shuttle plasmids into S. aureus RN422() 
(6). 

p-Lactamase puriflcalion from S. aureus. The wild-type p-laclamase-produc- 
ing strains and the RN422() transformants were grown in 5 liters of modified 1% 
CY medium conUiining methicillin (0.5 |xg/ml) or 2-(2'-carboxyphciiyl)henzoyl- 
6-aminopenicillanic acid (7.5 ^.M, to induce p-lacUimase production) al 37''C and 
150 rpm lor IS h. The exlraceltular p-lactama.ses were purified by .sequential 
cation-exchange and allinity chromatography (21). 

Enzyme kinetics. Initial velocities of hydrolysis were monitored al a wave- 
length corresponding lo the maximal change in ahsorbance between the unhy- 
drolyzed substrate and the hydrolyzed product, which included the following; 
cephaloridine, 254 nm; ccfazolin, 272 nm; nilrocefin, 482 nm; ccphapirin, 258 
nm; ampicillin. 235 nm: and penicillin G. 232 nm (37, 39). P-Uictamase assays 
were performed in 0.1 M sodium phosphate butler, pl l 6.0, in 1-cm cuvettes at 
37''C with a DU-7() recording spectrophotometer (Ikckman In.strumenls, Ful- 
lcrton, Calif.). For K,„ and k^^^ delerminalions, assays of the initial velocity of 
hydrolysis were performed using 100, 50, 33.3, 2tl, 14.2, and 11.1 |xM solutions of 
each cephalosporin antibiotic. I'cnicillin hydrolysis assays were performed al 
initial substrate concentrations of 1,000, 500. 333, 20t), 142, and 111 jiM. The 
maximal rale of hydrolysis, l^niux' 'he Michaelis constant, K„„ for each 
substrate-enzyme combination were determined from (.v|/v-against-|.v] plots 
(where [.v] is substrate concentration and v is veltx'ity) (41) with computerized 
software (I type r; Department of Biochemistry, University of Llvcrpt)ol, Liver- 
pool. United Kingdom). The turnover number, k^..^y, was ealeutaled from the 
by using a nu)lecular ma.ss of the purified p-lactamase of 30.000 g/mol. Mean 
values and standard error of the mean values were calculated from the results of 
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FIG. 1. Tertiary .slnicturc of.V. aureus PCI p-lac(ama.sc. The active-site clelt 
of the 3-Iactamase is located at the left side of the (33 strand. The active-site 
amino acids are S-7(), S-13(). N-l.'^2, K-234, and R-244. Aniiiio acids T-12S and 
S-216 were substituted by site-directed mutagenesis of the cloned structural 
gene. The diagram was generated by the computer software MOI^SCRIIT (22) 
using X-ray crystallographic coordinates stored in the Brookhaven Protein Da- 
tabase (accessitm no. 3BLM). 



8 to 12 (.v|A'-against-[.vl plots for each en/y me -substrate combination by using 
computer software (Mini tab Oat a Analysis Software, release 10.2; Mini lab, Inc., 
State College, Pa.). 

RESULTS 

Constructiun and expression uf mutant p-lactamases. The 

primary goal of this study was to determine the functional 
domains, particularly the specific amino acid(s), responsible 
for the kinetic heterogeneity observed among naturally occur- 
ring variants of S. aureus p- lactamase. Mutant p- lactamases 
with single amino acid substitutions at residues 128 (Thr to 



Ala) and 216 (Ser to Asn) were constructed by using oligonu- 
cleotide-directed mutagenesis and the type A p-lactamase 
gene (blaZ) as a template. In addition, the reverse mutations 
were introduced by using the type D (Ala to Thr, residue 128) 
and type C (Asn to Ser, residue 216) bloZ genes as template 
DNA. These residues were selected primarily because of the 
amino acids which differed among the type A, C, and D p-lac- 
tamases, 128 and 216 were closest to the active-site cleft (Fig, 
1). Each blaZ was cloned into E. coli-S. aureus shuttle plasmid 
pVKlOl, which was then transforined into S. aureus RN4220. 
p-Laclamase production was induced, and the wild-type p-iac- 
tamases expressed by the reference strains and the mutant 
enzymes expressed in RN4220 were purified to homogeneity. 

Effect of substitution at amino acid 216 on the kinetics of 
type A and C p-lactamases. p-Lactams which have been shown 
previously to be useful in distinguishing between the wild-type 
5. aureus p-lactamases (20) were used to compare the kinetics 
of hydrolysis of the reference and mutant enzymes (Tables 2 
and 3). Between the type A and type C p-lactamases theie 
were a 10-foId difference in the K,„ values of cefazolin, a 5-fold 
difference in the K„, values of cephapirin, and a 5-foId differ- 
ence in the k^,.^^ values of nitrocefin. These differences appear 
to be due to whether Ser or Asn was present at residue 216. 
Replacement of Ser by Asn in the type A p-lactamase yielded 
a mutant (A, S216N) that was closer kinetically to the type C 
than the type A enzyme (e.g., cefazolin K,„ values: 167 fxM, 
mutant; 145 |jlM, type C; 13,1 |uiM, type A). The reverse mu- 
tation using type C hlaZ DNA for site-directed mutagenesis 
yielded a mutant (C, N216S) that was similar to the type A 
enzyme (cefazolin K„, value, 15,3 \x.M). Also, the differences 
between the kinetic type A and type C S. aureus p-lactamases 
are clearly shown by comparing relative efficiency of hydrolysis 
values (Table 4). The k^.^^ values of most substrates other than 
nitrocefin did not clearly distinguish between the type A and 
type C enzymes (Table 3). 

Effect of substitution at amino acid 128 on the kinetics of 
type A and D p-lactamases. The reference type A p-lactamase 
had a cefazolin /C„, value that was threefold lower than and an 
ampicillin value that was twofold greater than the respec- 
tive /C„, values of the type D enzyme. In addition, the k^.,^^ 
values of ampicillin and penicillin G were three- to fourfold 
higher with the type A compared to the type D p-lactamase. 
These differences were related to whether Ala or Thr was 
present at residue 128. Replacement of Thr by Ala in the type 
A p-lactamase yielded a mutant (A, T128A) that was closer 
kinetically to the type D than the type A enzyme (e,g., peni- 
cillin /:^..,( values: 47 s~\ mutant; 254 s~*, type A; 66 s~*, type 





TABLE 2. K„, 


values of p-Iactam antibiotics for purified wild-type and mutant p-lactamases of 5. aureus" 










Mean K„, 


, I(iM (SEM)l lor p-lnctama.sc 






Antibiotic 




Wild type 






Altered by site-directed mulagenesi.s 




pSI, type 
A 


pi 13X04, 
typeC 


pUBlOl, 
type D 


pVKIl)7 

(A, 
S2ir.N) 


pVKlOS 

(A. 
T12SA) 


pVKiny 

(C, 
N216S) 


pVKllO 

(IJ. 
A12ST) 


Cephaloridine 

Cefazolin 

Cephapirin 

Nitrocefin 

Ampicillin 

Penicillin G 


4.3 (1.0) 

13.1 (2.8) 
4.5(1.2) 
5.1(1.3) 

195 (35) 

29.2 (6.5) 


6.3(1.8) 
145 (18.7) 
24.9(1.7) 
10.5 (2.3) 
128 (22) 
25.5 (6.9) 


4.6(1.2) 
38.3 (9.3) 
7.1(1.7) 
7.1(1.6) 
119 (22) 
26.0 (8.2) 


6,5(1,7) 
167 (35) 
273 (3.9) 
11.6 (2..3) 
153 (14) 
29.0 (9.1) 


5.3(1.3) 
43.9 (6.6) 
7.0 (0.8) 
6.8(1.5) 
118 (15) 
34.3 (9.4) 


4,5 (0.7) 
153 (2.7) 
5.5(1.5) 
6.7(1.5) 
208 (31) 
31.3 (8.6) 


4,8(1,1) 
133 (1.8) 
4.5(1.0) 
6,6(1.7) 
219 (43) 
19.7 (6.5) 



" The initial velocities o( hydrolysis of 100, 50, .33.3, 20, 14.2, and I I.l ftM .^lolutions of the cephalosporins and 1.000, 500, 333, 200, 142, and 111 p-M solutions of 
ampicillin and penicillin G were monitored with a recording spectrophotometer. values were determined by the use of (.vl/w-againsl-l.vl plots. Each value represents 
the mean from 8 to 12 determinations. K,„ values that are altered by amino acid replaeement.s are in boldface. 
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Antibiotic 



TABLE 3. k^,,y values of ^-lactam antibiotics for purified wild-type and mutant p-lactamases of .V. aureus 

Mean /c^.;„ value [s"' (SEM))" lor p-laclamasc 



Wild type 



Altered hy site-directed mutagenesis 



[)S1, type A 



p] 13X04, 
type C 



pUBIDI, 
type I) 



pVKl()7 

(A. 
S21f)N) 



pVKlOS (A, 
T128A) 



pVKKW (C, 
N2U)S) 



pVKlin (D. 
A12ST) 



Cephaloridine 


1.5 (0.3) 


2.6 (0.2) 


0.97 (0.2) 


3.4 (0.4) 


0.83(0.1) 


0.74 (0.17) 


1.37 (0.1) 


Cefazolin 


1,4(0.1) 


1.8 (0.1) 


2.1 (0.1) 


2.51 (0.3) 


2.24 (0.1) 


0.86 (0.1) 


0.92 (0.1) 


Cephapirin 


0.37 (0.01) 


0.6 (0.01) 


0.21 (0.02) 


0.75 (0.1) 


0.19(0.03) 


0.27 (0.02) 


0.36 (0.01) 


Nitrocefin 


33.7 (4.6) 


6.9 (0.4) 


16.3 (2.0) 


8.0 (0.9) 


15.3 (3.4) 


33.1 (4.1) 


37.0 (4.4) 


Ampicillin 


560 (49) 


249(14) 


133 (23) 


353 (28) 


91 (10) 


392 (33) 


423 (46) 


Penicillin G 


254 (17) 


140(5) 


65.6 (6) 


162(14) 


47.3 (6.8) 


173(16) 


192 (12) 



" Mean value in molecules of antibiotic hydrolyzcd per second per molecule of cnzynie, as determined from j-vlA'-against-fv) plots and assuming a molecular ma.ss of 
the purilied p-laetamase of 3().t)l)0 g/mol. Each value is derived from X to 12 detenninalions. k^,^^ values that are altered by amino aeid replacements are in boldface. 



D). The reverse mutation using type D blaZ DNA lor site- 
directed mutagenesis yielded a mutant (D, A128T) that was 
similar to the type A enzyme (penicillin G k^,,^^ value, 192 s"'). 
The type A and D enzymes are easily distinguished by the ratio 
of the rates of hydrolysis of penicillin G and cefazolin (Table 
5). 

DISCUSSION 

Naturally occurring variants of 5. aureus p-lactamase can be 
distinguished on the basis of the kinetics of hydrolysis of se- 
lected penicillin and cephalosporin antibiotics (20, 43). In this 
study we have shown that these kinetic differences are deter- 
mined by single amino acid substitutions at positions close to 
the active site of the enzyme. Specifically, the presence of an 
Asn instead of Ser at residue 216 determines a type C kinetic 
profile, and the presence of an Ala instead of Thr at residue 
128 determines a type D kinetic profile. Enzymes exhibiting the 
type A kinetic profile have Ser and Thr at these two sites, 
respectively. We also found that substitutions at some positions 
other than 128 and 216 where the type A, C, and D enzymes 
have different amino acids did not alter the kinetic profile of 
the enzyme (e.g., amino acid 121 [data not shown]). 

This situation is reminiscent of what has been reported for 
the newer TEM-type p-lactamases and SHV p-lactamase vari- 
ants among gram-negative bacterial species (17). Broad-spec- 
trum TEM and SHV variant p-lactamases capable of hydro- 
lyzing ceftazidime, cefotaxime, and/or other newer 
cephalosporins have become problematic in many medical cen- 
ters in recent years, especially among isolates of Klebsiella 
pneumoniae and £. coli (14, 17, 31). As with the S. aureus 



enzymes, the altered kinetic profile of the broad-spectrum 
TEM-type p-lactamases is based on modest changes in primary 
structure, generally one to three amino acid substitutions at 
sites close to the active site (32). In addition, single mutations 
that result in TEM and SHV variant p-lactamases exhibiting 
resistance to commercially available p-lactamase inhibitors 
such as clavulanic acid have been described (7, 15). 

A major difference between the histories of the variant TEM 
and staphylococcal enzymes, however, is that whereas the 
former appear to be a consequence of selective antibiotic pres- 
sure in the clinical setting, the 5. aureus enzymes appear to 
have remained remarkably stable over time. Although p-lac- 
tamase-producing strains oiS. aureus spread widely during the 
first few years following the clinical introduction of penicillin, 
some clinical isolates collected and saved prior to penicillin use 
have been shown to produce the A and C variants of 5. aureus 
p-lactamase. Furthermore, the prevalence of the various types 
of staphylococcal p-lactamascs among clinical isolates in the 
United States in the 1980s (19, 20) are similar to what was 
reported in England by Richmond in the mid-1960s when he 
first described the existence of different staphylococcal P-lac- 
tamase serotypes (34). Despite the widespread use since the 
early 1960s of antistaphylococcal penicillins such as methicillin, 
oxacillin, and nafcillin, new staphylococcal p-lactamases capa- 
ble of hydrolyzing these agents efficiently have not been ob- 
served. 

Both of the amino acid positions that we have shown to be 
responsible for the kinetic differences among the wild-type S. 
aureus enzymes have not previously been cited as contributing 
to p-lactamase function. The major kinetic difference between 
the type A and C enzymes is the values of certain ccpha- 



TABLE 4. Relative clTiciencies of hydrolysis for p-laclam nnlibiotics of purified wild-lypc and mulanl p-laclamascs of i'. aureus 



REH" (%) of 3-laclamase 



Antibiotic 




Wild type 






Altered by sile-directcd mul agenesis 






pSl, 


pn3St)4, 


pUBlOl, 


pVKl()7 


pVKlOS 


pVKKW 


pVKllO 




type A 


type C 


type D 


(A. S2U)N) 


(A. T12KA) • 


(C N216S) 


(D, A12ST) 


Cephaloridine 


100 


100 


100 


100 


100 


100 


100 


Cefazolin 


32 


3 


24 


3 


28 


24 


25 


Cephapirin 


24 


6 


14 


5 


17 


30 


29 


Nitrocefin 


1,943 


159 


1,095 


133 


1,406 


1,980 


2,004 


Ampicillin 


820 


473 


562 


449 


481 


868 


690 


Penicillin G 


2,565 


1336 


1,200 


1,071 


863 


2.208 


3,578 



" Relative elliciency of hydroly.sis (REM) is a relative value of k^.JK,„. These values are expressed as a percentage of the REM for cephaloridine by the same 
P- lactamase (i.e.. the cephaloridine REM was calculated in liters per mole jier second and assigned a value of 11)0). determined from the mean k^.^^ imd K,„ values for 
each enzyme-substrate combination. REM values that are altered by amino acid rephiccments arc in boldface. 
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TABLE 5- Ratios of the rates of hydrolysis of penicillin G and 
ccfazolin by kinetic type A and D (i-lactamascs 



p-I^clamasc (lypc) 



Relative k^, 
ralio" 



Fixed -co ncn 
ratio'' 



Wild type 

pSl (A) 178.7 183.5 

pUBlOl (D) 32.5 36.6 
Altered by site-directed mutagenesis 

pVKlOS (A, T128A) 23.6 45.7 

pVKl 10 (D, A128T) 208.5 162.5 

" Ratio of the k^.^^ values of penicillin O and cctii/otin. 
^ Petcrmincd from the initial velocities of hydrolysis using a 5(K) \sM concen- 
Iralion of penicillin G and 100 p-M concentralittn of cefa/olin. 



losporins, particularly cefazolin and cephapirin (Table 2). 
These enzymes differ not only in hydrolysis of certain cepha- 
losporin substrates but also in the inhibition profile of some 
p-lactamase inhibitors, including sulbactam (unpublished ob- 
servations) and tazobactam (3), with the type A enzyme being 
more susceptible to inhibition. 

The substitution at residue 216 might affect p-lactamase 
structure and function in several ways. First, the side chain of 
Asn is bulkier (the accessible surface area of Asn is 158 A [15.8 
nm] [28]) than that of Ser (122 A [12.2 nm]) and may be 
hindering the substrate binding into the active-site cleft. Mod- 
elling studies with cefazolin docked into the PCI p-lactamase 
active-site cleft show that the side chain substituent in 
cefazolin is positioned close to side chain of Ser-216 (unpub- 
lished observations) and substitution of Asn for Ser at this 
position would result in steric hindrance. Second, the refined 
crystal structure of PCI p-lactamase at 2 A (0.2 nm) indicated 
that the amino acid 216 is located on a short 3i„ helix com- 
prising amino acids 215 to 217 and this helix is stabilized 
through a helix N capping (33) between Asn-214, which is 
highly conseiA^ed among class A P-lactamases (2). and Ser-216 
(O-N, 2.9 A [0.29 nm]). In addition to this, the side chain OH 
of Ser-216 is also involved in a hydrogen bond with Asn-214 
side chain carbonyl (O-0, 3.3 A [0.33 nm]). Substitution of Asn 
for Ser at 216 may alter the topology of this short helix. The 
crystal structure of a type C enzyme could help to clarify the 
structure of this loop, and attempts are under way to crystallize 
the type C enzyme. Third, amino acid 216 is located close to 
the p3 strand, and changes at this residue might alter the 
relative positioning of other active-site amino acids such as the 
K-T-G triad. 

The reason why the replacement of Thr by Ala at residue 
128 should affect enzyme function is less clear. The effect of the 
substitution was primarily a reduction in the /c^..,, of the peni- 
cillins and nitrocefin along with modest effects on the /C„, 
values of cefazolin and ampicillin (Table 2). The crystal struc- 
ture of PCI indicates that amino acid 128 is located at the C 
terminus of a-helix 4 close to the active-site cleft. It is two 
residues away from the highly conserved S-D-N loop (amino 
acids 130 to 132) of the class A p-lactamases. The catalytic 
function of the S-D-N loop has been verified by site-directed 
mutagenesis of Streptomyces albas G p-lactamase (16) and E. 
coli TEM p-lactamase (30). The proximity of amino acid 128 to 
the catalytically important S-D-N loop might explain the ki- 
netic differences between the type A and type D 5. aureus 
P-Iactamases. Preliminary experiments in which residue 128 
has been replaced by other amino acids also have been shown 
to affect the kinetics of hydrolysis (unpublished observations). 

In conclusion, naturally occurring type A, C, and D variants 
of S. aureus p-lactamase exhibit kinetic differences due to sin- 



gle amino acid differences at positions close to the active site 
which have not previously been shown to be involved in enzy- 
matic activity. It is likely that the substitution at the amino acid 
128 affects enzyme function by altering the structure of cata- 
lytically important S-D-N loop. The differences between the 
type A and type C enzyme could be due to steric hindrance to 
substrate binding and/or some structural stabilization effects. 
This has to be verified by kinetic studies with mutant enzymes 
in which amino acids 128 and 216 are substituted with different 
amino acids as well as molecular modelling studies with differ- 
ent p-lactam substrates. 
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Extended-spectrum TEM ^-lactamases (ESBLs) do not usually confer resistance to ^-lactamase inhibitors 
such as clavulanate or tazobactam. To investigate the compatibility of the two phenotypes we used site-directed 
mutagenesis of the ^/ay,rM.| gene to introduce into the TEM-1 ^-lactamase amino acid substitutions that confer 
the ESBL phenotype:^ TEM-12 (Argl64-^Ser), TEM-26 (Argl64^Ser plus Glul04^Lys), TEM-19 
(Gly238^Ser), and TEM-15 (Gly238^Ser plus Glul04^Lys). These were combined with three sets of sub- 
stitutions that confer inhibitor resistance: TEM-31 (Arg244^ys), TEM-33 (lVlet69^Leu), and TEM-35 
(Met69— >Leu and Asn276— >Asp). Introduction of the Arg244— >Cys substitution gave rise to inhibitor-resistant 
hybrid enzymes that either lost ESBL activity (TEM-12, TEM-15, and TEM-19) or had reduced activity 
(TEM-26) against ceftazidime. In contrast, the introduction of Met69-^Leu or Met69^Leu plus Asn276-^Asp 
substitutions did not significantly affect the abilities of the enzymes to confer resistance to ceftazidime, 
although increased susceptibility to cefotaxime was observed with Escherichia coli strains that expressed the 
TEM-19 and TEM-26 p-lactamases. With the exception of the TEM-12 P-lactamase, introduction of the 
Met69-^Leu substitution did not give rise to enzymes with increased resistance to clavulanate compared to that 
of the TEM-1 p-lactamase. However, introduction of the double substitution Met69-->Leu plus Asn276— ^Asp 
in the ESBLs did give rise to low-level (TEM-19, TEM-15, and TEM-26) or moderate-level (TEM-12) clavu- 
lanate resistance. None of the hybrid enzymes were as resistant to clavulanate as the corresponding inhibitor- 
resistant TEM p-lactamase mutant, suggesting that active-site configuration in the ESBLs limits the degree of 
clavulanate resistance conferred. 



Gram-negative bacteria may exhibit reduced susceptibility to 
p-Iactam antibiotics by a number of mechanisms including 
reduced outer membrane permeability, target-site modifica- 
tion, and efflux of the p-lactam out of the cell (20, 23). How- 
ever, by far the most common mechanism of resistance is the 
enzymatic inactivation of the P-lactam by a p-lactamase (18). 
There are many types of p-lactamases, which have been clas- 
sified by their amino acid sequences and corresponding sub- 
strate profiles (6). The TEM-l p-lactamase belongs to a func- 
tional group of broad-spectrum enzymes that are inhibited by 
clavulanate (6). This group includes enzymes such as the 
SHV-1 and OHIO-1 p-lactamases. Although the TEM-1 p-lac- 
tamase does not usually provide protection against extended- 
spectrum cephalosporins such as ceftazidime and cefotaxime 
or p-lactamase inhibitors like clavulanate and tazobactam (ex- 
cept in the case of TEM-1 overproduction), amino acid sub- 
stitutions can alter the hydrolytic spectrum of the p-lactamase 
to encompass these compounds. 

Extended-spectrum TEM p-lactamases (ESBLs) do not usu- 
ally confer resistance to p-lactamase inhibitors, suggesting that 
the two phenotypes may be incompatible. In support of this 
suggestion, Imtiaz et al. (15) have shown that introduction of 
an amino acid substitution (Argl64— >Ser) that confers on the 
TEM-1 P-lactamase the ability to efficiently hydrolyze ceftazi- 
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dime leads to the loss of clavulanate resistance when intro- 
duced into the inhibitor-resistant p-lactamase TEM-31. How- 
ever, recently a clinical Escherichia coH isolate that expressed a 
P-lactamase, TEM-50 (CMT-1), that conferred low-level resis- 
tance both to p-lactamase inhibitors and to extended-spectrum 
cephalosporins has been reported (22). 

In order to investigate this phenomenon further we used 
site-directed mutagenesis of the TEM p-lactamase encoding 
gene to introduce into ESBLs amino acid substitutions known 
to confer inhibitor resistance. We found that the different 
amino acid substitutions gave rise to enzymes that conferred 
different resistance phenotypes. None of the substitutions con- 
ferred high-level resistance to both p-lactamase inhibitors and 
extended-spectrum cephalosporins, although the double amino 
acid substitution (Met69-^Leu, Asn276— ^Asp) in the TEM-12 
P-lactamase did give rise to an ESBL with a moderate level of 
clavulanate resistance. 

MATERIALS AND METHODS 

Bacterial strains and plasmlds. E. coli CJ236 \diii-1 ung-l thi-l rclAl; pCJIOS 
(Cm'^)] and E. coli MVl 190 {^{lac-proAB) ihi mpE A(ir/-ax-/l)J06::TnlO{Tcl'); 
(F'uruD36 proAB /ffc/^ZAM15)l were used in this study. The plasmid vector 
pTZ18U was used as the initial source of the W«tem gene. All bacteria were 
grown in Luria-Bcrtani (LB) broth or on LB agar (Oxoid, Basingstoke, United 
Kingdom) containing the appropriate antibiotic (chloramphenicol, 20 jig/ml; 
amoxicillin, 100 tJLg/ml; or tetracycline, 10 ^,g/ml). 

Antibiotics and reagents. The following companies kindly suppMcd antibiotic 
powders of known potencies: Bristol Meyers Squibb (ccfepimc and aztrconiim); 
American Cyanamid (piperacillin and tetracycline); Glaxo Group Research Ltd. 
(ceftazidime and cephaloridinc); Rousscl Laboratories Ltd. (cefotaxime and 
chloramphenicol); and SmithKlinc Bcccham (amoxicillin, clavulanate, tcmocil- 
lin, and ticarcillin). Nitroccfin was obtained from Oxoid. 
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TABLE 1. 


Oligonucleotides used in site-directed mutagenesis experiments 


and in DNA sequencing 


Procedure and oligonucleotide 


Sequence (5 '-3')" 


Codon change and nuclcolidc position'* 


Mutagenesis 




A 1 1 — *\J 1 I 


Ile86— >Val 


OOCO 1 CAAL-ACOOUA I AA 1 


Vall84-*AIa 


ATTGCGGCAGGCATCGTGG 


GTA^GCC 


Met69^Leu 


TAAAAGTGCTCAGCATl^GGAAAAC 


ATG->CrG 


Asnz /o — *Asp 




r\r\ 1 r\ 1 


Arg244— *Cys 


A 1 UA 1 A^^L-OCAAOAI^C*^ 




01uiU4— >^Lys 


U i OAU 1 A_i_ I XAACCAAO 1 


n AH— >A A A 




CAL-UL- 1 UAC i OOC 1 CUAOA 1 I I A I 




Argl64^Ser 


GTTCCCAAGAATCAAGGC 


CGT->TCr 


Sequencing 






f-TEM2F 


GTATGAGTATTCAACAIXrCCGTGI'CG 


205-231 


f-TEM2R 


ACCAATGCTTAATCAGI^GAGGCA 


1064-1042 


f-TEMi 


ACTGTCATGCCATCCGTAAGA 


556-536 


f-TEM2i 


CTGCGGCCAACrrACn^CfGACAA 


598-621 



" Altered bases arc underlined. 

'* Nucleotide positions arc according to Sutcliffc (27). 



Susceptibility testing. MICs were determined by agar dilution on Diagnostic 
Sensitivity Test Agar (CiV1261; Oxoid) with an inoculum of about 10"* organisms 
per spot as described previously (24). £. coU NCTC 10418 was used as the control 
strain. 

Site-directed mutagenesis. Site-directed mutagenesis was performed with the 
reagents contained within the Muta-Genc Phagemid In Vitro Mutagenesis kit 
(version 2) from Bio-Rad (Hemcl Hempstead, United Kingdom). The proce- 
dures used in this kit arc based on the method originally described by Kunkcl ct 
al. (17), Oligonucleotides were designed with the aid of oligonucleotide design 
software (PrimerSelcct; DNAStar) and were based on the sequence of the 
/)/£/xEM-i gene reported by Sutcliffc (27). The oligonucleotides were custom made 
by Pharmacia Biotech (St. Albans, United Kingdom) (Table 1). 

DNA sequencing. In order to confirm that mutations had been introduced, 
plasmid DNA was extracted with a Oiagen OIAprcp kit (Oiagen Ltd., Crawley, 
United Kingdom) and was sequenced in both directions with Huorescein-labclled 
primers (Table 1). DNA sequencing was performed with the reagents contained 
in a cycle sequencing kit (RPN 2438) from Amersham Life Sciences (Little 
Chalfont, United Kingdom) by following the manufacturer's instructions. The 
annealing temperature for the cycle sequencing reactions was 60**C, and the 
DNA sequence was determined with an automated DNA sequencer (Pharmacia 
Biotech). 

Determination of ICjuS. Each strain was grown at 37*'C in brain heart infusion 
broth (Oxoid) for 16 h, with shaking (200 rpm). The cells were harvested by 
ccntrifugation and were rcsuspcndcd in 0.5 ml of sterile distilled water, and the 
P-lactamasc was released by sonication. Sonication was performed for 20 s.with 
a W-385 sonicator (Heat Systems; Ultrasonics, Inc., Farmingdalc, N.Y.) with the 
following settings: 5-s cycle timc» 50% duty cycle, and a 1 .5 output control setting. 
p-Lactamasc activity was measured by monitoring the rate of nitroccfin hydro- 
lysis (10 |xM) at 482 nm in a Biochrom 4060 spectrophotometer (Pharmacia 
Biotech). Ail assays were performed in 0.1 M phosphate buffer (pH 7.0) and at 
37*C. In order to take into account the different levels of p-lactamasc activities 
within the samples the activity of each sample was standardized to give an 
absorbancc change of 0.15 per min. Samples were preincubatcd for 10 min at 
37'C with various concentrations (O.OI to 50 ^jlM) of the p-lactamasc inhibitor 
before the p-lactamasc activity was determined with nitrocefin (10 as the 
reporter substrate. The concentration of p-lactamase inhibitor required to inhibit 
50% of the p-lactamasc activity (ICso) was then determined graphically. 

RESULTS 

Mutagenesis. Phagemid pTZlSU conveniently encodes a 
bh y^^ gene, and use of pTZlSU thus negates the need to 
subclone the bla-y^^ gene from another source. However, the 
bla-y^f^ gene from pTZlSU is not identical to bla-y^^-i the 
result of two nucleotide changes, G^'^'^^A and C^^^^T, that 
were introduced to remove Pstl and Hindi restriction sites, 
respectively. While the resulting amino acid substitutions, 
Ile84— >Val and Alal84-^Val, have been regarded as neutral 
(22a), Chaibi et al. (8) have demonstrated that the catalytic 
efficiency of the "artificial" TEM p-lactamase was one-half to 
one-third lower than that of the TEM-1 P-Iactamase. Conse- 
quently, in this study we initially converted the artificial bla ^^M 



into bla-y^M-] subsequently used this gene as the template 
for the construction of the TEM mutants. Four ESBL enzymes 
(TEM-12, TEM-15, TEM-19, and TEM-26) were constructed 
together with three p-lactamase-inhibitor-resistant mutants 
(TEM-31, TEM-33, and TEM-35) (Table 2). In order to in- 
vestigate whether the amino acid substitutions found in p-lac- 
tamase inhibitor-resistant mutants could confer inhibitor resis- 
tance if introduced into ESBL enzymes, the three sets of amino 
acid substitutions that confer inhibitor resistance were engi- 
neered into the extended-spectrum antibiotic-resistant TEM 
p-lactamases by altering the gene-coding sequence. The amino 
acid substitutions corresponded to those found in the TEM-31 
(Arg244^ys), TEM-33 (Met69^Leu), and TEM-35 (Mel69^ 
Leu and Asn276^Asp) p-lactamases. In all cases the intro- 
duced nucleotide changes in the bla-yi^M gene were confirmed 
by DNA sequencing. 

Phenotypic characterization of TEM-1 p-lactamase and mu- 
tant derivatives, (i) TEM-1 and ESBL enzymes. The MICs of 
ampicillin and ticarcillin in the presence of clavulanate (2 jxg/ 
ml) and piperacillin in the presence of tazobactam (4 ixg/ml) 
for E. coli MV1190 expressing the TEM-1 p-lactamase were 
relatively high (Table 2). This could be accounted for by the 
large quantity of the TEM-1 P-lactamase expressed as a result 
of the high copy number of the pTZlSU plasmid carrying the 
^^^'I'EM-i ge"^ (Table 3). Despite this, because the TEM-1 
p-lactamase and the mutant enzymes in this study shared the 
same genetic background, comparisons between the mutant 
enzymes and the TEM-1 p-lactamase could still be made. 

The TEM-12, TEM-15, and TEM-26 p-lactamases were 
found to confer 16- to 128-fold higher levels of resistance to 
ceftazidime than the TEM-1 p-lactamase, confirming that 
these enzymes were indeed ESBLs (Table 2). Although the 
TEM-19 p-lactamase did not confer increased levels of resis- 
tance to ceftazidime, a 16-fold increase in the level of resis- 
tance to cefotaxime was observed. Cefepime was found to be 
less effective against E. coli M VI 190 strains that expressed the 
TEM-12 and TEM-26 P-lactamases, and with the exception of 
TEM-19, the ESBLs conferred higher levels of resistance to 
aztreonam than the TEM-1 p-lactamase did. E. coli M VI 190 
expressing either of the four ESBL enzymes was found to be 
more susceptible to the penicillin-p-lactamase inhibitor com- 
binations than E. coli MV1190 expressing the TEM-1 p-lacta- 
mase. None of the ESBL enzymes conferred increased resis- 
tance to temocillin. Measurement of the p-lactamase activities 
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TABLE 3. P-Lactamase activities and clavulanate and tazobactam 
IC^uS for TEM-1 p-Iactamase and TEM mutant enzymes 



Amino iicid chiingc 
from TEM-I" 


Designation 


P-Liictiimasc 
activit/* 


Clavu- 


TilZO- 

hactarn 


TEM-1 


TEM-1 


5,330 


0.08 


0.03 


M69L 


TEM-33 


586 


2 


0.5 


M69U N276D 


TEM-35 


1,190 


12 


0.6 


R244C 


TEM-31 


514 


10 


1.7 


G238S 


TEM-19 


2,950 


0.001 


0.002 


G238S, M69L 




853 


0.03 


0.02 


G238S, M69U 




2,360 


038 


0.04 


N276D 










G238S, R244C 




485 


1.5 


0.3 


G238S, E104K 


TEM-1 5 


29 


0.002 


0.004 


G238S, E104K, 




319 


0.07 


0.01 


M69L 










G238S, E104K, 


TEM -50 


952 


0.3 


0.02 


M69L, N276D 










G238S, E104K, 




696 


1.1 


1.2 


R244C 










R164S 


TEM-12 


25 


0.02 


0.02 


R164S, M69L 




49 


0.25 


0.2 


R164s! M69U 




1,440 


1.8 


0.3 


N276D 










R164S, R244C 




937 


4.5 


5 


R164S, E104K 


TEM -26 


523 


<0.01 


0.06 


R164S, E104K, 




211 


0.08 


0.3 


M69L 










R164S, E104K, 




458 


0.35 


0.2 


M69U N276D 










R164S, E104K, 




54 


3.5 


2 


R244C 











" Sec footnote a to Tabic 2 for a key to the amino acid substitutions. 
''Activities arc expressed as nanomolcs of nitroccfin hydrolyzed per minute 
per milligram of protein. 



of the ESBLs with nitrocefin as the reporter substrate indi- 
cated that the ESBL enzymes had lower levels of activity 
against nitrocefin than the TEM-1 p-lactamase; this was also 
true for the other mutant TEM p-lactamases (Table 3). 

(ii) Inhibitor-resistant enzymes. The substitutions Arg244^ 
Cys, Met69— >Leu, and Met69— >Leu plus Asn276— s^Asp in the 
TEM-1 p-lactamase gave rise to enzymes with resistance to 
clavulanate combined with resistance to amoxicillin or ticarcil- 
lin. Substitution of a Cys residue at position 244 of the TEM-1 
p-lactamase also resulted in an enzyme (TEM-31) that con- 
ferred lower levels of resistance to penicillins and cephalori- 
dine than the levels conferred by TEM-1 (Table 2). In contrast, 
a single Met69^Leu substitution and a double substitution, 
Met69->Leu plus Asn276^Asp, in the TEM-1 p-lactamase 
did not greatly affect the MICs of piperacillin, although a 
fourfold reduction in resistance to cephaloridine was observed 
(Table 2). None of the inhibitor-resistant enzymes conferred 
resistance to extended-spectrum cephalosporins, aztreonam, 
or temocillin. 

(iii) Substitution of Cys for Arg at position 244 in ESBLs. 
Introduction of the Arg244^Cys substitution into the ESBL 
enzymes had an effect similar to that in TEM-1. Like the 
TEM-31 p-lactamase, the resulting hybrid enzymes conferred 
lower levels of resistance to penicillins and cephaloridine than 
their respective parent enzymes did. However, the MICs of 



penicillin-inhibitor combinations were elevated for the strain 
with TEM-26 plus the Arg244^Cys substitution. In addition, 
the amino acid substitution resulted in enzymes that either had 
lost (TEM-12, TEM-15, and TEM-19) or had a reduced ability 
(TEM-26) to confer resistance to ceftazidime. The MICs of 
cefepime and aztreonam were reduced 4- and 16-fold, respec- 
tively, for E, coli MVri90 expressing the hybrid TEM-26 p-lac- 
tamase compared to the MICs for the strain expressing the 
TEM-1 p-lactamase. All the hybrid enzymes with the Arg244— > 
Cys substitution were found to be more resistant to clavulanate 
and tazobactam inhibition than the TEM-1 p-lactamase, with 
the IC^qS for the hybrid enzymes being comparable to those for 
the naturally occurring inhibitor-resistant TEM p-lactamases 
(Table 3). 

(iv) Substitution of Leu for Met at position 69 in ESBLs. 

The IC50S of clavulanate for the parental ESBL enzymes were 
lower than those of the TEM-1 p-lactamase, indicating that the 
ESBL enzymes were more susceptible to clavulanate inhibition 
than the TEM-1 p-lactamase. Introduction of a Leu residue at 
position 69 in the ESBLs resulted in hybrid enzymes that 
conferred increased levels of resistance to both clavulanate and 
tazobactam compared to the level of resistance conferred by 
their respective parental ESBL enzymes (Table 3). In the case 
of the TEM-12 p-lactamase, the amino acid substitution gave 
rise to a hybrid enzyme that was less susceptible to clavulanate 
inhibition than the TEM-1 p-lactamase. For the other ESBLs, 
however, the substitution resulted in hybrid enzymes for which 
clavulanate ICj^s were similar to those for the TEM-1 p-iac- 
tamase. Tazobactam was found to be equally effective against 
the TEM-1 P-lactamase and the hybrid ESBLs with a Gly238"» 
Ser substitution (TEM-15 and TEM-19). However, tazobactam 
was less effective against hybrid ESBLs with the Argl64^Ser 
substitution (TEM-12 and TEM-26) (Table 3). 

In contrast to the Arg244— >Cys substitution, introduction of 
a Leu residue at position 69 in the ESBLs resulted in hybrid 
enzymes that retained the ability to confer resistance to cefta- 
zidime and, in the case of the TEM-12 hybrid enzyme, that had 
increased levels of resistance to ceftazidime in combination 
with clavulanate. However, the amino acid substitution in the 
TEM-19 p-lactamase gave rise to a hybrid enzyme that con- 
ferred a lower level of resistance to cefotaxime than the parent 
enzyme did. 

(v) Substitution of Leu for Met at position 69 and Asp for 
Asn at position 276 in ESBLs. Introduction of the double 
amino acid substitution Leu-69 and Asp-276 in the ESBLs gave 
rise to hybrid enzymes that were more resistant to clavulanate 
inhibition than hybrid ESBL enzymes with a single Leu-69 
substitution. In the case of the TEM-12 p-lactamase, this gave 
rise to an enzyme for which the \C<io of clavulanate was similar 
to that for the inhibitor-resistant enzyme TEM-33. For the 
other hybrid ESBLs, however, the IC50S of clavulanate were 
intermediate between that for the TEM-1 p-lactamase and 
those for the inhibitor-resistant enzymes. The IC50S of tazobac- 
tam were similar for the hybrid enzymes with single or double 
amino acid substitutions. The double substitution did not 
greatly affect the ability of the enzymes to confer resistance to 
ceftazidime, although increased susceptibility to cefotaxime 
was apparent with E. coli MVri90 expressing the hybrid de- 
rivatives of the TEM-19 and TEM-26 p-lactamases. Two of the 
hybrid enzymes, TEM-12 and TEM-26, showed a markedly 
reduced susceptibility to ceftazidime combined with their sus- 
ceptibility to clavulanate. As a consequence of the double 
amino acid substitution, extended-spectrum resistant variants 
of TEM-12, TEM-15, and TEM-26 p-lactamases that also con- 
ferred increased levels of resistance to p-lactamase inhibitors 
were constructed. However, the levels of clavulanate resistance 
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conferred by the hybrid ESBLs were not as high as that con- 
ferred by the corresponding inhibitor-resistant TEM p-lacta- 
mase TEM-35. 

DISCUSSION 

Substitution of Cys for Arg at position 244. In this study we 
replaced the Arg at position 244 in the TEM-12, TEM-15, 
TEM-19, and TEM-26 p-lactamases with a Cys residue in 
order to investigate whether the amino acid substitution would 
give rise to inhibitor- resistant ESBLs. In each case the substi- 
tution conferred increased levels of resistance to p-lactamase 
inhibitors, but the substitution also gave rise to enzymes that 
conferred lower degrees of resistance to penicillin and cepha- 
losporins. Both these observations are consistent with a dis- 
ruption of the Arg244 hydrogen-bonding arrangement pre- 
dicted to occur in the TEM-1 p-lactamase (26, 30). Since the 
Cys residue at position 244 would be unable to form a hydro- 
gen bond to the common carboxylate group of p-lactam anti- 
biotics, this probably explains why the MICs of both penicillins 
and cephalosporins were affected by the amino acid substitu- 
tion. This would be especially pertinent if, as suggested by 
Zafaralla et al. (30), the binding energy of Arg244 is used to 
lower fhe activation energy of the hydrolytic reaction. 

The resistance to p-lactamase inhibitors conferred by the 
hybrid enzymes in this study is understandable in light of the 
essential role that the Arg244 residue plays in maintaining in 
position the water molecule (Wat399) believed to be important 
in the inactivation of p-Iactamase by clavulanate (14, 28). In 
naturally occurring variants of the TEM-1 and TEM-2 p-lac- 
tamases, as in our mutants, substitution of Cys, Ser, or His 
residues at position 244 has given rise to inhibitor-resistant 
enzymes (1, 3, 4, 29). The shorter side chains of the substituted 
amino acids in the inhibitor-resistant variants are thought to be 
unable to form a hydrogen bond with Wat399, which is dis- 
placed as a consequence and which is unable to act as a proton 
source in the inactivation process (14, 16, 19). However, our 
results contrast with those of Imtiaz et al. (15), who reported 
that a substitution of a Ser for Arg at position 244 in the 
TEM-12 p-lactamase (also derived from TEM-1) neither con- 
ferred inhibitor resistance nor significantly affected the en- 
zyme's ability to hydrolyze ceftazidime. Why the two different 
amino acid substitutions gave rise to two different effects is not 
clear. Imtiaz et al. (15) have suggested that an alteration of the 
topology of the active site that is caused by the Argl64— ^Ser 
substitution in the TEM-12 p-lactamase may have resulted in a 
different clavulanate binding arrangement that promoted a 
repositioning of the water molecule close to the site of inacti- 
vation. Consistent with this suggestion we found that the four 
ESBLs in this study were more sensitive to clavulanate inhibi- 
tion than the TEM-1 p-lactamase. If a different clavulanate 
binding arrangement does occur in the hybrid enzymes, the 
results of this study show that the nature of the residue at 
position 244 is still important in dictating whether the enzyme 
is resistant to clavulanate or not. Thus, it would appear that the 
Ser residue, but not the Cys residue, either performs a role 
similar to that of the Arg residue in the hybrid enzymes or, 
through structural rearrangement, promotes another residue 
to perform a similar function. 

Substitution of Leu for Met at position 69. Unlike the 
TEM-1 p-lactamase, in which substitutions of Leu, Val, or He 
for the Met at position 69 have all given rise to inhibitor- 
resistant enzymes (9, 11, 25, 31), substitution of a Leu residue 
for the Met residue at this position in the four ESBLs did not 
give rise to clavulanate-resistant enzymes. This probably can be 
explained by a different binding arrangement of the clavu- 



lanate molecule in the active site of the hybrid ESBL enzymes 
compared to that in the TEM-1 p-lactamase. As noted previ- 
ously the parental ESBLs were more sensitive to clavulanate 
inhibition than the TEM-1 p-lactamase, suggesting that alter- 
ations within the active site enhanced the inhibitory action of 
clavulanate. Although the hybrid ESBLs were not resistant to 
clavulanate, they were less sensitive to clavulanate inhibition 
than their respective parent enzymes were. The substitutions at 
position 69 are thought to cause slight alterations to the active- 
site structure of the TEM-1 p-lactamase, resulting in deforma- 
tion of the oxyanion hole and a less favorable binding orien- 
tation of the clavulanate molecule (10). This suggests that the 
clavulanate molecule still interacts with the oxyanion hole but 
possibly in a different manner. 

Substitutions of Met69-*Ile or Met69^Val in the SHV-5 
p-lactamase and Met69— ^lle in an OHlO-1 p-lactamase mu- 
tant bearing a Gly238->Ser substitution have all given rise to 
enzymes that were less susceptible to inhibition by clavulanate 
than their respective parent enzymes (2, 12). These mutant 
enzymes exhibited reduced penicillinase activity and, in the 
case of the SHV enzymes, a reduced ability to hydrolyze ceph- 
alothin and cefotaxime (2, 12). In contrast, substitution of a 
Leu residue at position 69 in the extended-spectrum TEM 
p-lactamases in this study did not significantly affect the ability 
of the enzymes to confer resistance to penicillins and ceftazi- 
dime, although reduced levels of resistance to cefotaxime were 
noted with the TEM-19 hybrid enzyme. These variations may 
or may not be related to the different nature of the substituted 
residues in each case. While all three residues may exert a 
hydrophobic effect, only the branched residues Val and He are 
thought to produce additional steric constraints. The smaller 
impact of the Leu69 substitution on the structure of the 
TEM-32 p-lactamase has been used to explain the lower cla- 
vulanate Ki value for this enzyme compared with that for the 
TEM-1 P-lactamase with Met69-^IIe or Met69^Val substitu- 
tions (9). Whether the substitution of Val or He into position 
69 of the ESBLs examined in this study would give rise to 
hybrid enzymes with greater degrees of clavulanate resistance 
has yet to be determined. However, in light of the reduced 
penicillinase activity of the SHV and OHIO enzymes, a similar 
reduction in resistance to penicillins and possibly cephalospo- 
rins may also be observed. 

Substitution of Asp for Asn at position 276 plus Leu for Met 
at position 69. Amino acid substitutions at position 276 have 
been found naturally only in combination with changes at po- 
sition 69 in the TEM p-lactamase (5, 13, 22, 31), although the 
change can confer inhibitor resistance in the absence of a 
substitution at position 69 (7, 21, 28). Recently, Sirot et al. (22) 
have reported on a natural variant of the TEM-15 p-lactamase, 
designated TEM-50, with amino acid substitutions, Met69->Leu 
and Asn276— ^Asp, found in the inhibitor-resistant p-lactamase 
TEM-35. An E. coli strain expressing the TEM-50 p-lactamase 
displayed susceptibilities to p-lactams, including ceftazidime 
and cefotaxime, that were between those for strains expressing 
the TEM-15 or TEM-35 p-lactamases. In our study we artifi- 
cially constructed the TEM-50 p-lactamase together with mu- 
tants of the TEM-12, TEM-19, and TEM-26 p-lactamases. In 
contrast to Sirot et al. (22), we found that the MIC of ceftazi- 
dime for E, coli M VI 190 expressing the TEM-50 p-lactamase 
was only twofold lower than the MIC of ceftazidime for the 
same strain expressing the TEM-15 p-lactamase. A possible 
explanation for this difference may have been the exceptionally 
high level of p-lactamase expressed from the high-copy-num- 
ber plasmid pTZlBU harboring the TEM-coding gene used in 
this study. Such high levels of p-lactamase expression may have 
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masked small difTerences in hydrolytic activities between the 
enzymes. 

In agreement with Sirot el al. (22), we found that the 
TEM-50 p-lactamase conferred low levels of resistance to cla- 
vulanate. We also demonstrated that when the double amino 
acid substitutions were introduced into the TEM-19, TEM-12, 
and TEM-26 p-lactamases the resulting enzymes also con- 
ferred increased levels of resistance to clavulanate and, in the 
case of the TEM-12 and TEM-26 derivatives, retained cefta- 
zidime resistance. Indeed, these two hybrid mutants showed 
considerably reduced levels of susceptibility to the ceftazidime- 
clavulanate combination. Previous studies have shown that cla- 
vulanate is more potent against strains that produce inhibitor- 
resistant TEM p-laclamases with a single substitution (Asn276— > 
Asp) than against those that have double substitutions (Met69^ 
Leu and Asn276— >Asp) (5, 7). Similarly, we demonstrate that 
double substitutions within the four ESBLs in this study also 
resulted in hybrid enzymes that conferred greater resistance to 
clavulanate than the levels of resistance conferred by those 
with the single Met69^Leu substitution. Furthermore, consis- 
tent with the study of Caniga et al. (7) on inhibitor-resistant 
TEM p-lactamases, we found tazobactam to be more potent 
than clavulanate against strains producing inhibitor-resistant 
enzymes with double substitutions. Thus, there appears to be a 
correlation between the inhibitor resistance phenotypes con- 
ferred by the single (Met69-»Leu) and double (Met69^Leu 
and Asn276— >Asp) substitutions in the TEM-1 p-lactamase 
and those conferred by the same substitutions in the extended- 
spectrum TEM p-lactamases. 

In conclusion, of the hybrid enzymes constructed, the hybrid 
of the TEM-12 p-lactamase conferred the greatest reduction in 
sensitivity to clavulanate while it retained the ability to confer 
resistance to ceftazidime. As with all the hybrid enzymes, in- 
cluding those with the Arg244-<]ys substitutions, the level of 
resistance to penicillin-clavulanate combinations that was con- 
ferred (Table 2) and the reduction in the degree of sensitivity 
to inhibition by clavulanate (Table 3) were not as high as those 
for equivalent inhibitor-resistant TEM p-lactamases. This sug- 
gests that the altered active-site configuration in the ESBL 
enzymes limits the degree of clavulanate resistance conferred 
by the ESBL-inhibitor hybrid enzymes. Whether this is due to 
a ditferent binding arrangement of the clavulanate molecule in 
the active site of the extended-spectrum TEM p-lactamases or 
some other factor has yet to be determined. 
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In 1980 and 1 98 1, Grantham and his colleague (1,2) reported 
the codon usages in a total of 161 protein genes in this journal, 
and in 1986 and 1988 we reported those in 1638 and 3681 genes, 
analyzing all the data available in those days (3,4). Now the codon 
usages in 11415 genes can be analyzed using the nucleotide 
sequence data obtained from the GenBank Genetic Sequence Data 
Bank (Release 62.0, Dec., 1989). Because of the growing size 
of the database, in this year a part of the data is listed. It is planned 
to distribute the electric version of the Sequence Supplement of 
this journal using a CD ROM, beginning in 1991. This year is 
a transition year, and thus we will send, upon request, a magnetic 
tape or a hard copy listing the codon usages in 11415 genes. 

Table 1 lists the codon use in each of the 1543 nuclear genes 
registered in the GenBank Primate Sequence File. The LOCUS 
names given in the GenBank were used for designating individual 
genes, and the SHORT DIRECTORY of the GenBank is 
presented for defining each LOCUS name (Table 3). 

To reveal the characteristics of the codon use of a wide range 
of organisms, as well as viruses and organella, the frequency 
(per one thousand) of codon use in each organism for which more 
than 20 genes are available was calculated by summing up 
numbers of codon use (Table 2). The number of genes sununed 
for each organisms is given in the row designated as No. GENES, 
and the total codon number thus summed is given at the bottom 
row. Since the codon usage of each organism thus summed has 
been expressed in frequency per one thousand, it is easy to 
compare the codon-usage patterns among different organisms. 
Confirming the 'genome hypothesis' of Grantham et al. (1, 2), 
among taxonomically related organisms (e.g. among mammals) 
the codon<hoice patterns resemble each other but they differ 
between distant organisms (see also scatter diagrams of Fig. 1). 
Synonymous codon-choice patterns in different genes of a single 
unicellular organism are known to be usually similar with each 
other regardless of gene functions and thus with the pattern listed 
in Table 2, (dialectal codon-choice pattern found for individual 
unicellular organisms, see ref. 5). However, codon-choice 
patterns in one higher vertebrate often differ significantly between 
different genes (5-7): The diverse codon-choice patterns found 
among genes of a single higher vertebrate have been pointed out 
in connection with the evident diversity in the G+C% at the 
codon third position among the genes (5,6). It should be stressed 
that the characteristic pattern for the mammals listed in Table 
2 is obtained only after summing up the genes with varying 



functions (3,4). When codon usages of approximately 10 or more 
genes with varying functions were summed up for each mammal, 
they usually resulted in a very similar pattern and thus in the 
pattern listed in Table 2, regardless of differences in the genes 
used for the summation (3.4). The fact that the pattern roughly 
common among the mammals does not depend on the genes used 
for the summation shows that this relates with general 
characteristics of their genomes: 1) deficiency of CpG and TpA 
dinucleotides (8); 2) paucity of genes in the A+T-rich genome 
portion, i.e. in A-fT-rich isochores and in G/Q bands (see ref. 
9, 10; thus C- and G-ending codons are preferred); 3) gross 
similarity of tRNA population between different organs of higher 
vertebrates (our unpublished data); 4) gross similarity of anuno 
acid composition between different proteins, as well as between 
different mammals. 



METHODS 

In selecting protein coding sequences we relied on the 
FEATURES tables of die GenBank, and only complete genes, 
starting with an initiation codon and ending with one of stop 
codons, were used in the analysis (see ref. 3 for details). In the 
GenBank, a group of consecutive genes whose entire region had 
been sequenced were registered under one LOCUS name. To 
distinguish the different genes belonging to a single LOCUS, 
symbol # followed by a number is added after the LOCUS name; 
die numbers represent the order of the peptides registered in die 
FEATURES of die GenBank. When introns of a gene have not 
been completely sequenced, some of its exons are registered in 
separate entries (LOCUS) in the GenBank. These exons belonging 
to the same gene but having different LOCUS names were 
combined, and the LOCUS name of the last exon followed by 
symbol * was given to the gene thus combined (3,4). The order 
of the codons in the table is die same as the previous compilation 
(1-4), 
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Table 3. Correspondence between LOCUS nanoes and gene nanics for Table 1, and abbreviation for Table 2. 



•AM10L 



CHPAOQIOO 

C»«»AZQU> 

CHPOAAM 

CHPOQOUOO 



HUMWCnt 
HUMIF3HQ7 



HUUA1ACM 

HUUAIAftt 

HUMIATS 

HUyAIATM 

HUMAtATP 

HUMAtATD 

HUMAIATfO 

HUM^IATZ 

HMMIOLVa 

HUMAtOLVC 

HUMAIOPia 
NUMA1QPU 

HUMAaM 

HUMAanA 

HUMAMI 
HUMA^m 



HUMAOCVM 
HMMCCYW 
HUMMHM 
HUMACMMV 



HUMCTM 

HUMMTA 

HUMACTAM 

HUMACTCM 

HUMACTOA 

HUMACTQNM 



HUMAONICA 
NUMMMtCt 
HUMOmC 
HUMMMBA 
HUMAOHICI 



AFRKM4 onsN io«(rr nc-i COL olOflmt »*woa coM>icTE 
AmjkN oicEN yoMccY CMOS pnaro<MoooBS. 

9tMMN TMNBKMMNQ (MOWTH FACTOMETA (TOF) MRNA, OOIMTE 
OML IMNICY (A.TIMmATU« VOUUN QOC. OOMFim CDS. 
OM. I«M(CV MMUJCfW (lAMX Aa£L£» OK OOMPirrC CM 
CM iAIOON ALPHA-1 QUOWN Q&C 

iABOON (P.»MMA0nnrA9 iMUMB) AlCOHOL DCHVmOQaMSE (AOH) 

•A0OON M>OUf>OmorEm E (APOC) QCNE. OOMPim COS. 

IMCON KTACHOnKMC QOfMOOrmPIN (BETA^ IM4A. COMPUTE 

BMOON LVIff>HOCVTE HOIM«MI»CBKM PECCPT^ 

OUVE iAiOON ALPm^tOltMJC nCTA (t>OtOtil QCNE. 

CHMi«»ANZEE FETAL A«AIAIA^aU3eM GENE. 

C»fMPANZEE QCNE FOR AimUJKE znA-l-QUOMN. 

CMMRANZES MRNA FOnCHLMI PftCTEM. 

CMUPANIEE MMA FOR C»LAAiai PnOTCK 

CHMPANZEE FETAtQ Q AMt^ QLOeWOEME. 

AFC (CHM>AKIEE} AlPHA-l-ObOlM MRNA. 

APE (CHMPAMEE) AIPHA»OtC«M MRNA. 

CHMPWIIEE (P.PAMKUB) KWQLUCIWi COMPiETE COS. 

CHMPANSE MC CLAK I OAA OWN WMA, COMUTE COS. CIM 

APC{aM0N)MTB&aMN2MnNA. 

AK (Oam ifrBftCUNN t MRNA CONT AMNQ MBERTGD LTR 
QNMN MTBUUnN » OU) MRNA, COMPLETE COS. 
HUMAN VfTEnFENCM MDUCCO ^aA SYNHCTASE MRNA. COMPIETE COS. 
HUMAN romB0MU.»OX)ACYL<X)A T»«XAK MMA^COMRJETE COS. 
HUMAN 4F9 ANTKI0I XAW CHAM MRNA, OOMPl£TE COS 
HUMUI4R<ILVOOSVUTB)»CWrCHMN(4FaHC)ANnQENaEICEN0N 
HUMAN tMM. R«0NUClfi)PfOrEM (RO) MMA, OOMAjETE COS. 
HUMAN Mfy^fOWPOtYPgTBim 

HUIMN MrCRKROIM«MaBLE mOTEM M7 MRNA. COMPLETE COB. 
HUMAN AimA-l-ANTCNVMOrTRVPSM COMfUTE OENE. MRNA. 



HUMAFU 
HUM^ 



HUMM ALPHA-l-AWrrTRVPSM MfSSk COMPLETE COS 

HUMAN ALm^l-ANTrmVPSNI MRML O0MPi£TE COS. 

HUMkN AtmA-l-ANTnTfrPSMOENE (i VARHNTX COVtETE COS 

HUMAN A4mA-1<ANnTRVPSM MRML OOWlCrE COS. 

HUMAN MACnOFHAOC Aim^1<AimrmFBIH(ALmA1<AT» MRMk ff-ENO 

HUMN 2 TVPf MPm-I^ANTirRYPMN OEIC COMPIETE COS (EXONi 

HUMAN AU>HA>1-ACI> QLV0aPfVrEM8(MPa)0E»C. COMPIETE COS 

HUMMMMAroRALPHA-1-AC»OLVG0PROrCM(0n3S0MU0OI)). 

HUMN ALFHA.f -ACD OLVCOPnoreN OeC iL EXON A 

HUMMMmA-l-AOD OLVOOPftOreN OBC S EXON S 

HUMAN Mm^l-ACOOLVOOmornM GENi S. EJttN S 

HUMAN MRNA F0WAL g<j^i 4SCRD fllgy jN ANDHMS 

HUMAN AimAMASIM SSMITOR (AlJm«fl) MRNA, OMPLETE 

HUMAN ALPHAMAMM SMrrOROCNE. EXON ia 

HUMAN AimA^TMOt mOVEMASC MMSrrOR MRNA. OOI#l£TE 



HUMAN mUMCHAM ACVLCOA OOfyOnOOENAK (ACAOM) MRNA. 

HUMAN CVTDRLAMMC SCTAACTW QEN^ OOMRLETE COS. 

HUMAN CVTORUMMB KT^UCTM QENE, COMPLETE COS 

HUMN MRNA FOR MUBClf ACCTYLCHOIME RKEFTOR ALMASUBUNrr. 

HUMAN ACCTYICNOUNE RROTOA AimASUSUMT QDC EXONS PI 



HUMAN M MUSCARMC ACCTVICHOUNE RKEFTOR aOC 
HUMAN TArnnATB-RMTANT AOD PHOSPHATASE TYPE B MMA. 
HUMAN MRFM FOR ACNOSSt^AA^.IO^ 
HUMAN ACnVATOI {MT4) MRNA. OOMim COS. 
HUMAN MRNA FOR VASCULAR SMOCrrHMUaCLE AU>HAACTML 
HUMAN AOULT SKELETAL MUBCU ALPHAACTW MRNA. 
HUMAN AtmACARDMC ACTM OEfC ENONA AND y FIAML 
HUMAN erTCSKaErAL0AMMAACnNa8<,C0M>l£rE COS. 



HUMANCtASSIAlOOHOLOOWPRDQDiAM (AflMI)MWAaumT 
HUMAN CL ASS I AlOOMOLDEIMlROQ EN A K yWI) ALPHA Kmsr 
HUMANCtASS I AIMNOL DEHVCROQBMIE KTA-f SUaUI«r. AUAE 1 
HUMAN CUSS I MOONOL OefronOOBMaC (AOHQSETA-I SUSUWT 
HUMAN CU S S I AIJ0OH0L0CMVOROaCNASE(AOm)KTA-1 SUBUMT. 
HUMAN CUSS I ALOONQL OEHVOROOENAaE (A0HB)KrA>1 SUBUMT 
HUMAN CUSS I A100H0LDOfnR0ae«A8E(A0M)KTA-1 SUSUMT. 



HUMUIHR 



HUMAlOOSi 



HUMAUFQA 

HUMIPIQC 

HUMAIPHA 

HUMAIPHS 

MJMUmA 

HUMAUil 

HUmPtt 

HUMUPL12 



HUMAIPPB 

HUMAUiRCI 

HUMAU>PO 

HUMU>R 

HUMIA 



HUMAMV310 
HUM^ 



HUMANFA 
HUMANS 
HUMMr?D 



HUMANT^^ 



HUMMTCOat 

HUMANTCE 

HUMAHTU 

HUMANTLAA 

HUMU4TLF3 

HUMANTNC 

HUMANTP 



HUMAPC3Q 
HUMAPQA 



HUMAPOMS 
HUMAR0A4C 
HUMARQAtt 



HUMAPQAIP 
HUMAPCMT 



HUMPOCa 



HUMPOa 
HUMAPOCIA 



HUMAPRTA 
HUMAR 



HUIMN AlPmmOPROrTBN tAFP) URM^ COHPIETE COS. 

HUMAN SERUM MSUMM OCNE. COHPl£TE COS. 

HUMAN AOPOCVTC UPKVSMDINQ PnOTEK OOMPIETE COS 

HUMAN FRUCTOSE 1 >WPH08PmTA$e MRML COMPLETE COS. 

HUMAN AUX9UBE A MRML OOII>l£TE COS. 

HUMAN FNROtUST MRNA FOR A1D0LM6 A. 

HUMAN AtDOUSE S WNA. COm£TE COS. 

HUMAN AUX3U8E ft C*LD0S)QOe EXONB 7 TmOUSH A 

HUMAN AUXJUSE B MRML COMPIETE COS. 

HUMAN AUWLASEC OE>g. 

HUMAN ALO0U9E C OENi FOR FRUCT06E 1>ai9PH0SPmTE 

HUMAN RNA FOR MnOCHOMMAL ALOemS 0EHVDR0QCNA8E I ALDH I 

HUMAN MRNA FOR MTTOCHOMMM. ALOemiC OCHirOROQENASE (AlOH 

HUMAN AUMUSC AQEIC (EC 41.ft.in 

HUMAN ALOOUSE B QEIC EXON « AND 7 FLAMC 

HUMAN imM FOR ALDOLASE S 

HUMM MRm FOR APOUPOPROTBN A. 

HUMU4 MJCALBC PHOSPHATASE (Al>-1 ) MRNA. COMPLETE COS 

HUMkN MJUUNE PHOSPHATASE QENE, OOM>i£TE COS. 

HUIMN AOULT tNTESTMAL PHOSPHATASE TYPE. COMPi£TE COS. 

HUMAN U/ER PHOSPmTME 2A MRNA, COMPl£rC COS. CU3NE HL-1 4. 

HUMM BfTESTttML ALXAUW PHOSPWTABE QENe COMPLETE COS. 

HUkMN BOESTWMLALKAUNE PHOSPmTASE MRML COMPLETE COS. 

HUMM UVEMONEMONEV-TYPE ALKAUNE PHOSPHATASE (ALP) 

HUMN RACafTAL ALXALME PHOSRMTASE (RAP) MRNA. COMPLETE 

HUMM PLACENTAL ALKAUNE PHOSPIMTASE TYPE 9 MRNA. 

HUtWN PLACBTTAL ALKAUNE PHOSPHATASE MRNA. COMi>l£TE COS. 

HUMAN PUCENTAL»CAT-ST ABU ALKAUNE PHOSPHATASE (PUR-1) 

HUMAN MRNA FOR ANTlEUHOPnorEAaE (ALP) FROM CBMX inERUB. 

HUMAN ALOEHVDE REDUCTASE MWM. COIS>LETE COS. 

HUIMN ALDOSE REDUCTME MRNA, 0OMPl£TE COS. 

HUIMN SA0EN08VUCTM0MNE OGCARBOKVIASE MRNA, COMPUETE COS. 

HUAWNAfcWOPEPnPASEIiCOItMRNAENOOOSttAMB M ' C I' HAAa iN. 

HUIMN PANCREATC ALPHAAMnASE (AMVq OML EXON ia 

HUMM rSLET AMVtOO PROTEM IMM, COMfUTE COS 

HUMNANDROQEN RECEPTOR (AR) MRML COMPtfTE COS 

HUBMN mTRRJREnC FACTOR COMPLETE 

HUMAN LUPUS P70 pU) AUTQANnOEN PROTON HPNA. COMPLfTE COS. 

HUMAN HEMfT^SKELETAL MUSCLE ATRMOP TRANBLOCATOR (ANTI ) 

HUMM 8SWUR0 RMONUCtEOPnOTEM AinOANnODI M M) SUBUNTT 

HUHN OrreCNnATKM AHnOEN (CDit) MRML COMPIfTE COS. 

HUIMN SURFACE ANnOENCOa MMA. COMPUTE COS 

HUIMN COM ANTUEN MRM, COMPUTE COS 

HUMN CARCMOEMBRYONC ANTIQEN QENE. COMPLETE COS. 

HUMM LVimOCVTE ACTIVATION ANnoai 4Fa LARQE SUSUMT MRNA. 

HUIMN U ALfrOANnOEN MRNA, COMPLfTE COS. 

HUMM MRNA FOR LYIS>HOCYTE FUNCTION MSOCUTEO ANnOEfM 

HUMAN NQNBPeCIFC CROSSREACTMQ ANTIQOI MRML COMPLETE COS. 

HUIMN M R NA FOR RED CEaANWNT m NB P O n TPRCrTEBl 

HOMO 8APCNB, HUMMI TYPE S (TARTMTE'RESNrTANn ACIO 

HUIMN MRNA FOR AMVUOO PROTTGN PRECURSOR A*-7St. 

HUIMN QENE FOR APOUPOPROTEM C-Ml 

HUMAN MRNA FORUVER-TTFE ALKALJNE PHOSPHATASE (EC 9.1 X\y 

HUMAN MRNA FOR APOU' O Pi m UNUM. 

HUMM APOUPOPROTEM M OBC OONMETC COS. 

HUIMN APOUPOPROTEm MV OeC COAMTC COS. 

HUMAN APOUPOPROTEM MV08«. PARTIAL COS. 

HUMAN APOUPOPROTEM M (AP0A4) SM, EXQNB I . a AM) 1 

HUMN APOUPOPROTCM M AND C« QEICS. COMPLETE COS. 

HUMM APOUPOPROTESI (kBK A4 ON CMIOMaSOIC 1. 

HUMAN PREPRQAPOA4 MRML COMPUTE COS. 

HUIMN APOUPOPROrEM A4 OCNl OOmSTB COS. 

HUMAN APOIM>RCrrEM B-tOO MRNA. OOMPLETI COS. 

HUMAN APQUPOPROTEBI B-MO MRNA. COMPLETE COS. 

HUMAN APO CI OENE ENOOOtn APOUPOPROTEM OL COMPLETE 

HUMM APOiPOPROTEM C« QENE. COMPLETE COS AND AUHJKE 

HUMAN APOUPOPROTEM C« (APO&K) WMfL COMPUTE COS. 

HUMAN APOUPOPROTEM C4i (APO&W) MRNA, COMPUTE COS. 

HUIMN APOUPOPROTCN C4 MML 

HUMAN APOLPOPROTEil C^ (VLOgOENE, COMPUTE COS 

HUMAN APOUPOPROTEM 04 OSC COMPUTE CDS 

HUMM MiOUPOPROTEM 0 tSSM, COMPLETE COS. 

HUMAN AP0LK3PROTEM 0 QENE. ENON 4 

HUMAN APOLmOraN E IM4A. COMPLETE COS 

HUMAN APOUPOPROreN E (EPSaON a AND a ALLELES) MRNA 

HUMAN APOUPOPRDTEIN fi (B>SA0N4 ALULE) QENE. COMPLETE 

HUMAN APRT OENE FOR ADEMNE PHOSPHORnOSYtrTMNSFERASE. 

WMAN AOOMC PHOSPHORMOSYLTIWFERASE (/tfRT)QB<E. 

HUMAN ANOROOEN RKOTOR MRNA. COMPUTE COS. 



HUMAN CLASS I AlCNOHQL DCmMOQENASC (AOM) PI SUSUMT 

HUMAN AOMi MRNA BCOOSIO ALCOHOL OEHrOROQENME CLASS I GAMMA 

HUMAN NM>f AOPMBOSniRMBFERASE (AOPRT) OEIC EXON aai 

HUMAN PUTMTAiPIIA i A O R PiroC RECenORQPC. COMPLETE 

HUMAN PUTELETALPIIA a AD R BCmC RECEPTOR MRNA. COMPLETE 

HUMAN ALPHA a AORDCRSC RECEPTOR OEIC. COm£TE COS 

HUMAN KTA-l-AfiRDCROe RCCSPTOR MNA. COMPLETE COS 

HUMAN BCTAS^ tt R EI CRQI C RECEPTOR MRNA. COtyLETE COS 

HUMAN BETA«AORENBUC RK9T0R QENC, COMPLETE COS. 

HUMAN AORMOOBM RBXJCTASS MNA, COMfUTE COS. 

HUMAN AMONDimi^^ 

HUMAN APOPIRRfTM ^ CHAM) MRNk 

HUMAN OEM FOR L APOFERRTTW EXONB 3 AND 4 

HUMAN MPHAATOPnoriM OeC OOIMETE COS. 

HUMMI AMnOOM MRML OOMPlfTE COS 

HUMAN AlPHAOALACTOSilASE A MRML OOMPlfTE COS 

HUMAN ANOnOEMN QBC COmETE COS. AND T>«K AUI 

HUMWI ALPHA-1 ACO OLVOOPROrCM MRNA. COMPICTE COS 

HUMAN AL J% M a IgO L VCOPWCI HM ALPHA AM) BETA CHAM MRNA. 

HUMAN ANOnUUSM kOOMCRTMO ENZYME MRWL COAfLETE COS. 

HUMAN CYTOSOUC AOOmATC KSMSE (AK1) QENE. COMPLETE CfB. 

HUMAN Da.TA AMNOL XV UUNATE DEHYDRATASE MRNA. COIS>LETE CDS. 

HUMAN SERUM ALBUMN MRNA. COMPLETE COS 



HUMAN ANOROQEH RECEPTOR tSML OOlPUTE COS. 

HUIMN QEPC FOR AROMASfi EXON a AND y<fLAI«a«l RCOON (EC 

HUMAN LMER ARQSMSfi MRML COMPUTE COS. 

HUMAN AR0MATA8E (AROI ) MRML COHFUTE COS. 

HUMM AROMATASE MRML OOMAETE COS. 

IATE8YimCTABEOM.EX0NB 1CL 11. ia AND 
HUMAN ARGMNOSUOCMATE SVNnCTASE MRML COMPUTE COS. 
HUMAN ARVLBU/ATASE A MRNA. COMPUTE COS 
HUMANA SW bOO L r UJPHH IUNR UUtP I IJM HI ISMA.OOIVLETBCOS 
HUMAN ASMUOOLVOOPROTaN RECEPTOR H> IMM. COMPUTE COS 
HUMAN A »W MMN06 UCC i»» T E LYASE MRML COMPLETE COS 
HUMAN A n BNIWSUC C M A TE LYASE MRNA, COWLETE COS. 
HUMAN ASPARAQBC SVNT>CTASE IMW. COMPLETE COS 
HUMAN ASPARTYLTRMk SYNTHETASE AIPHA« SUSUMT MRML 



HUMATIXI 

HUMATC 

HUMATOCOS 

HUMATCra 

HUIMTCT4 

HUMATCTIA 

HUMATH3U7 

HUkUTP 

HMMTPAR 

HUMkTPBR 



HUMAN ANTTTHROMSM M OBC EXON A 
HUMAN PLACENTAL ANnCQAaULANT PROTEM (RAP) MRML COMPLETE 
HUMANT-GEaSURFACCMTiaENCOecriUMRML COMPLETE COS 
NUMNT-Cai SURFACE ANTKlCNT3DCLTAOMMaENe,EX0NS 
HMMH T-CBX SURFACE OLYOOPROTEM T4 MRNA. COMPLETE COS 
HUWkN T-CEU SURFACE PROTEM T« MRNA. 
HUMAN (OYSFUNCTKXMU ANTTTHROMBM M (ATM) UTAH QQC 
HUMAN PLASMA MEMBRANE CALiCMI ATPASE MML COMPLETE COS 
HUMAN MRM FOR WLKATPA9E ALPHASUBUMT. 
HUMAN MRNA FOR NAM-ATPASE BETA SUSUMT. 
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HUMATPC 

HUUATPFIB 

HUUATP8V2 

HUHB1LYM 

HUMa2tt2 



HUMBARR 
HUMBCAT 
HUMBCOF 



HUUeCTHA 
HLMBCTHB 



HUMDOALRP 
HUUaQPQ 
HUMBOPI 
HUMBHAU 



HUMBLYMt 



HUMBMP1A 
HUMBMP2A 
HUMBMP28 
HUMBMPM 



HUHBTIP 

HUIIC1A2 

HUMCltMA 

HUMCIiMB 

HUMCtR 

HUMC1RB 

HUMCtRX 

HUMCISAB 



HUMCA2 

HUMCACV 

HUUCACVA 



HUMCAM) 

HUMCALBft 

HUMCALC8E 

HUHCALCm 

HUMCAUA 

HUUCAUAM 



HUMCANPR 

HUMCAPQ 

HUMCAPPTA 

HUMCATOC 

HUyCATQia 

HUHCATHL 

HUMCATP 



HUMCBP 

HUMCePE 

HUMCCQ1 

HUMCaO 

HUMC014a 

HUMC01« 

HUMCOtBI 

Huycotot 



HUMCDZm 



HUMC0C3 
HUMCOWfUA 

HCMCFANT 
HUMCfTRM 
HUMCFVM 
HUHCnOU 



MMCHRAA 



HUMAN AOP/ATP CARRIS) PROTEIN UPHA. COUfUTE COS. 

HUMN Hnm FOn F1-ATPASE beta SUBIMT (F-1 BETA). 

HUMAN ATP SYNmABE BETA SUBUNTT QSC. EX0N3 ft-l a 

HUMAN fr4.YMPH0CYTE CEH-BUVACE AHTIQEN Bl (C0»^ 

HUMAN BETA^IBCnOGLjOBUJN QENE, EXONS 3 AND 31 

HUMAN OENE FOR BETAAOnENERQIC RECEPTOR (BETA-Z SUBTYPE). 

HUMAN MRNA FOR BRAM BETA-AOnBCnOIC RECEfTOa 

HUMAN BRANCfCD C»«UN ACYLTRANSFERABE MRN^ OOMPIETE COS. 

HUMAN BCEllQRCWTH FACTOR MRNA, COMPLETE C0& 

HUIMN BUXO COAGULATION (NHWrTOR MRNA, COMPIETE COS. 

HUMAN CI ffMBTTOR OENE. EMON li 

HUIMNB«EU.LeUK£MUM.YMPH0MA2(BCt^PnafTC>0NC0GENE MRNA 
HUMAN BCEIJLLEUICMUA.VMPHOIM 2(BCL4)PnOTa0NC0GENE MRNA 

HUMN BCR MR^W (BREAK POINT CUUSTER QENEX 

HUMAN BRAIN-TYPE ClATHRM UGHT-CHAM A MRNA, COMPLETE COS. 

HUMAN BRAIN-TYPE CLATHRIN UQHT-CHAM B MRNA. COMPLETE C0& 

HUMAN BETA QAUCT06I0A8E OENE FOR BETA^OALACTOSIOASE. 

HUMAN BETVOALACTOSIDASE (QLBl ) MRNA, COIfflETE COS. 

HUIMN BETA OAUOOSIOASE OENE FOR BETAOALACTOSIDASE RELATED 

HUMAN OENE FOR BOC OLA PnOTElN (BOP). 

HUMAN BlUARY OLVCOPROTEM I (BOP I) MRNA. COMPLETE CDS 

HUMAN BETA4CX0SAUINCUSE AU>HA GENE, EXON 1 4. 

HUMAN HVrDtNE-nCH PROTEIN MRNA. COMPLETE COS. 

HUMAN BLYM-1 TRANSFORMNQ GENE, COMPLETE COONQ REGION. 

HUMAN MRNA FCR EXTRACaUJLAR MATRIX PROTTEM BU^ 

HUMAN BONE M0RPHOGENET1C PROTEM 1 (BMP-1) MRNA. 

HUMAN BOfE M0RPH0GENET1C PROTEIN 2A (BMP4A) MRNA. 

HUMAN BONE MOPmOOBCTC PRCTTEtMS (BMP«) MRNA. 

HUMAN BOIC MORPHOQENETC PR0TEM4 (niP4) MRNA. 

HUIMN MRNA FOR B4frB QCNE. 

HUMN BNBt MRm. COMPUETE COS. 

HUIMN MISPHOSOPHOQLVCCRATE MUTASC (BPOM) OENE. EXON 1 

HUIMN ERmfftOGYTE ^MtaPNOSPHOQLYCCRATE MUTA8E MRNA. 

HUIMN (BSF^AJI) OENE FOR B CEU STIMULATORy FACTORA. 

HUMM BETA-nffOMOGUOaUUIHitE PROTBN MRNA, OOklHfTE COS. 

HMMNOOUAGENALPMMTYPEIMRm, COMPLETE COS. 

HUIMN l\AaMA PROTEASE (CI ) NMBTTOR MNA. OOI«>L£TE COS. 

HUMN PLASMA PROTSkSE tCI ) mmrrOR IMM. COMPLETE COS. 

HUtMNCOMPLBCNTCIR MRNA, COMPLETE COa 

HUMN COMPLEICNT OOMOCNT C1R MRMk COMPLETE C0& 

HMMNCOMfUMENr OOMPONB«T CI R MRNA. 

HUMtfl COMPLEICNr SUBCOMPONENT Clfll ALPHA- AND lETAOWIW^ 

HUIMN OCiMMEHr COMPONENT C» MRNA. ALPHA AM> BETA SUMMTTS. 

HUMkN COLLAGEN TYPE IV ALP»M4 C»MBI WNA. COMPLETE COI^ 

HUWN COMPLEMENT COiraefT OB MRNA. OOMilJETE COS 

HUMAN COMPLEMENT PROTEM COMPONENT C7 MRNA. OOMfUTE C0& 

HUMN COMPLEICNT PROTEM CBALPML BUeUMT MRM^ COMPETE 

HUMANCOMPLB»<T PROTEM CB BETA 8UBUMT MRNA, COMPLETE COS. 

HMMN COMPLEMENT COMPONBfT CMAMMA MRNA. COM>L£TE COB. 

HUMM COMPLBiefT PROTEM C« OAMIM SUMMIT MRNA. COMPLETE COS. 

HMMN OOMnaCNT COM>ONBfr OB MRML OOMPlfTE C0& 

HUMAN MRWk FOR CARBOMC AWffDRABE I (EC 4 J.1 .U 

HUWNCALCVCLM QEIC. COMPLETE COB. 

HUMN PROIACTM RECEPTOAMBOCIATED PROTEM (PRA) QBC. 

HUMUICARB0NCA»*1VDRAflE I MR^^ COMPLETE COS. 

HUIMN CARMNB MtfrCmaC ■ GENE, ENON 7. 

HUMMCO»M FORCARBONC AMfTDRABE L 

HUIMN CAUNTOMK IMMk COMPLETE OODHQ SEQUENCE. 

HUMAN MRNA FOR Z7-KDA CALMNDM 

HUIMN BEN UUNQ CARCBKUM CEa L«C MRNA FOR MQKM(R) 
HUIMN CAlOFONMCALCrTOMNOBC-RELATED PEPTDC GENE, EXON 
HUMM COMMON ACUTE LYMPHOBLMTC LHBCIIA ANnOEN (CALIA) 
HUMM CAILAMEP OENE ENOOOMQ ICUTRAL END0PB>TIM8E, EXON M. 
HUMN MRML FOR COMMON ACVTE LYMPHOCYTIC LfiJKEMA ANTIGEN 



HUMCINHP 

HUMCIX 

HUMCX8 



HUIMN CALMOOULM MRNA. COMPLETE COBl 

HUMtf* MRM FOR CAUnUM ACTIVATED NEUTRAL PROTEASE LARGE 

HMMNCATfrSiSM 0 QEIC. OOMIUTE COS. 

mMN CAMP DEPENOEWT PROTEM KBMBEREQUUTORYSUSUNff TYPE 
HUMN IffVM FOR CATHB»8M D FROM OESTROGEN RESPONSn^ BREAST 
HUBMNQEICPORCATMABECeC 1.11.1.B)EX0N 13 AND T FLANK 
HUIMN M(MA FOR PROCAT»CP8M LCMAKM EXCRETED PROTEM HEP). 
HUMAN KOCV MRNA FOR CAT ALABE. 

HUMN CORnOOSTEROO BMDMG QLOeUM MRM^ COMPLETE COS. 

HUMN GAMBOHQ PROTEM MNA. COMPLETE COB. 

HUMAN CALELGCTRM MRNA, COmETE COS. 

HUMNXCHROMBOICISVMFORCCai PROTEM BW. M CEU 

HUMAN CHOLECYSTOMMN (CCK) GENE, EXON 

HUMN GENE FOR 0014 DIFFERENTMTIONANnGEK 

HUMM MRNA FOR PC GAMM RECEPTOR (FCRBLCOI^ PCR-tO^ 

HUIMN OORHCM. THYMOCYTE ANTIGEN COIBC EXON 

HUIBLN00RT1CM.THYM0CYTE MfTIGBl CD1C QC»C EXON & 

HUMN MR»M FOR ■ LYMPHOCYTE ANHQEN COaO (11. BPJB^ 

HUIMN IMM FOR COaO RECEPTOR (S7X 

HUIMN T-CEU SURFACE ANnOBt Til (COa)GBC.EMN& 

HUIMN T-CCLL SURFACE iMTIQEN COB rn 1 ) MRNA. COIMTE COB^ 

HUBMN COa MmOEN QBC EXM B 

H UIMN I W M FOR coy ANTIGEN (0P4BV 

HUIMN coca GENE BMOLVED MCEaCVCLE CONTROL 

HUIMN C0WH4 ANT1QEN. OOimETE COS. 

HUMU4 CSUjOPLASMN (FBVOJeOASE) WNA. COMPLETE COS. 

HUIMN MRNA FOR CYSTIC FIB R OSIS ANTIGBUCFAS). 

HUIMN CYSTIC FMROSn MRNA, ENCOOtNG A PRESUMED 

HUMAN BLOOD OQAQUIATKM FACTOR V« GENE. COMPLETE COS. 

HUIMN OOMUATKM FACTOR XI GENE, EXON 1 & 

HUIMN BLOOD CQAGUULTION FACTOR Xn GENE. EXONB B-l 4. 

HUIMN CHORWNC QONAOOTROPM (HCa» OENE 8. BET*««ONrT. 

HUMM CHORKMC QONAOOTROPM (HCQ) GENE B. BETASLWMT. 

HUBMN CHORKXiC QOIMOOTROPM (HCO) BETA BUaUNn^ MRNA. 

HUIMN CHOnONK QOMU)OmOPM BET A SUeUNTT GBC. EXON 9. CUONE 

HUMMCHORKMC OOfMDOnWPM BETA SUeUMT G8C EXON 1. CLONE 

HUMM OQMP PHOSPHOOOTERABE ALPHA BUaUMT (0GPR4) MRNA. 

HUMWBUTYRYLCHOUNBBTERASE. MRNA COMPLETE COS. 

HUhHN FETAL BUrVRVLCHOLJNESTBMSfi MRNA. COMPLETE COS. 

HUlMNCmOMOaRAMN A IBVM OOBBHETE COS. 

HUIMN CWIOMOQRANm A MRNA. OOBTIETE COS. 

HUIMN MRML FOR SECRETOORAMN I ^mOMOQfMNN BV 

HUMAN LiPOPROTEBMBSOCMTEOCOAOUUTKM BMBTTOR MMA. 



HUMCKBBA 
HUMCKMA 
HUMCKMMB 
HUMCKMT 



HUMCffi 
HUMCNPBB 
HUMCNPDEA 
HUMCNPQB 



HUMCOAin 

HUMCOliR 

HUMCOVIC 

HUMC0X4A 

HUMCOXM 

HUMCOXCA 

HUMCP2110 

HUMCP2tOH 

HUMCP210HC 

HUMCP48W 

HUMCR1 

HUMcmpa 

HUII^PQ 

HUMCRPQA 

HUMCRYBAS 

HUMCRYQAa 

HUMCRYGBC 

HUMCRYQOa 

HUMCRVQCM 

HUMCRYQX2 

HUMCRYQX4 

HUMC91 

HUMCSF1MS 



HUMCSFM 



HUMC8PB 

HUMCSPG1A 

HUMC9YNA 

HUMCTFI 

HUMCnO 

HUMCTHG 

HUMCT88 

HUMCVM 

HUMCY4ARD 

HUMCYIB 

HUMCYC1 

HUMCYC1A 

HUMCVCAA 

HUMCVCR 

HUMCYES1 

HUMCYL 

HUMCYP14B 

HUMCYPMB 

HUMCYP«aO 

HUMCYP4BC 

HUMCYMA7 

HUMCVPAX 

MJMCYPB 

HUMCYPC17 

HUMCYPCN 



HUMCYPHIP 

HUMCYPNE 

HUMCYM 

HUMCYPNO 

HUMCYPNQA 

HUMCYPBCC 

HUMCY8M3 

HUMCYSW 



HUMDBLPRO 
HUMOSLTP 



HUMDCOK 
HUMDEF1A 
HUM0EF1AA 
HUMOEFA 



HUMOMPR 

HUMDOCW» 
HUMD0NT11 



HUMEBUR1B 
HUMEBVR 



HUMAN PLASMA SERBC PROTEASE (PROTEM C) INHWTOR MRNA. 

HUMN CQAOUAT10N FACTOR Dl HRM. COMPLETE COB. 

HUMAN CREATBC KMASE-B MMA. COMfUTE COS. 

HUIMN CREATME MfMSE BOSVME CtCS GENE, EXDN H 

HUMMCREATME MNABE SSUSUMT MRNA. COMPLETE COS. 

HUBMN CREATBC KBMSE M IMM. COMPLETE COft 

HUMAN MUSCLE CREATINE KMASE GENE (CKMMX EXON & 

HUMAN MTTOCHOfCRUL CREATINE KINASE GENE. COMPLETE COS. 

HUIMN COBB^LBIENT CVTOLYSe BttBTTOR (CU) MRNA, COMPLETE 

HUBMN HUMOS OENE HOMOLOQOUS TOTRANBP0RMB40 GENE OF MM9Y. 

HUIMN SKMCOUAQENASE MRNA, COMPLETE COS. 

HUMAN BLUE CONE PHOTORECEPTOR PtOMENT GENE, EXON & 

HUMAN ^^-CVCUC NUCLGOmOE 9-PH06PH0OIESTERA8E MRNA. 

HUMU4 GREEN OOC mOTORECOTOR PIGMENT GENE 1. EXON B 

HUBMN RED CONE PHOTORECEPTOR PK3MENT GENE. EXON 

HUMN PROALPma (I) COLLAGEN GENE TRANBCRB>TKm START 

HUMAN MRNA POR COILAQENASE (EjC. &4.M> 

HMMN MRN^ FOR CYTOCmOME C OKKMSE 9UBUMT ynC. 

HUMN CYTOCHROME C OOODASE (COK) SUBUNT IV MRML COMRJETE 

HUMN CYTOCHRONC C OaOOMSE SUeUMT VM MRNA, COMPLETE COS. 

HUMM CYTOCHROME C OXKMSE SUeUMT VB (CQXVB) MVM. COMPLETE 

HUIMN MUTANT CYPaiB OeC 0COOINQ AN ABBREMATEO 

HUBMN ai-monOXYLASE B GENE. COMPLETE COS. 

HUMM MUTANT 214M]RaXYLASE B GENE. COMPLETE COB. 

HUMAN UMQ GYTOCmOHE MSB (rv 8UBFAMB.Y) Bl PROTEBi 

HUMAN MRNA FOR COMPlBIBfT RECB>TOR TYPE 1 (CRl.C3aC46 

HMMN CELiUAR RETMOL BBONG PROTEM (CRBP) QENC EXONS B 

HUMM CARMNYL REDUCTASE MRML COMPLETE COS. 

HUMAN COHUCXIIHOHH-RELEAStNQ FACTOR (CRF) GENE. 

HUIMN &REACTIVE PROTEM GENE, COMPLETE COS. 

HUMN CREACnVE PROTEM GENE. COMPLETE COS. 

Wmn BETA^At-CRYOTALUN GENE (HUWA^IK EXON «l 

HUIMN OA MBW A CRYSTALJLM GENE (QAMIMOB). EXON 1 

HUIMN GAMMA B CRYSTALLM(OAISM t^ANDQAIMU^CCRYSTALlM 

HUIMN GMaiAftCRYSTALUN GENE (GAMMA-3) GENE. EXON 31 

HUBMN QAMBM DCRYSTALIM (GAMMAS) GBC. EXON 1 

HUMAN GAMIMMIYSTAUJI GENE (GAMMA 1^ EXON 1 

HUMN GAMMAC^RYSTALIM GENE (QAIMA a-U EXON & 

HUIMN CHORKMC SOMATOMAMMOTROPIN GENE HC8-1. COMPLETE COS. 



HUMMN MACROPHAQE-SPCCmC OOIONY-STNAJUTINQ FACTOR (CSF-1 ). 

HUIMHT-CCU.QRANUjOCYrEMACnOPIMGE COLQMY BTMUATMO FACTOR 

HUMM GRANULOCYTE'MACROPHAQE C0L0HY4T1MULATMG FACTOR 

HUIMN QRANULOCYTEMACROPHAQE OOU0NY4TBiUUTINQ FACTOR MRNA. 

HUIMN MULTNJNEAGE-OOLONY-SnMULATMG FACTOR IMM. COMPLETE 

HUMM kiVM FOR&SM OBC (CLQIC PBM-H 

HUIMN 8IM0k4a MRNA FOR COMPLEUENT-ABSOCWTED PROTEIN 

HUIMN BERSC PROTEASE B (CSF«) QBC COIinfTE COB. 

HUMAN CHONDnOfTBMERMkTAN SULFATE PROTeOOLYCAN (PO«0t CORE 

HUMkN CSYN PRCTOONOOGENE. COMPLETE COS. 

HUIMN IMM FOR CAAT-SOH BItOilQ TRANBCnPTKM FACTOR CTF-I 

HUIMN CATHEPBM D MRNA, COMPLETE COS. 

HUIMN CATHEPBM Q MRML OOMiLETE COS. 

HUMN CATHB>SM B PROTEiMSE MRML, COMPLETE COB. 

HUIMN MRNA FOR CYTOCHROME C OMMSE SUBUNTT Vn (EC 1.B,B1). 

HUNMN M0MATA8E 8YBTEM CYTODSVME P^ (P4B0}aX) MRNA. 

HUIMN CYTOCHROME BB MRN^ COMPLETE COS. 

HUMAN MR»M FOR CYTOCmME CI. 

HUMN CYTOCHROIS C-1 GE»C COMPLETE COS. 

HUIMN SOMATIC CYTOOBWMEC (HQS) QEIC COMPLETE CDS. 

HUMU4 MRNA FOR T'CELLCYCLOPWM. 

HUMAN C-YES-1 IMNA. 

HUIMN GYCUN PROTEM GEIC COMPLETE COS. 

HUMAN CYT0C»«OC P-1 -4B0 (TCOOMDUCMLE) MRNA. COMPLETE 

HUMAN MMA FOR CYTOCHROME PMa 

HUIMN GEIC FOR CYTOCtAOME RIHBOl 

HUMkN CYTOCMOC P-4B0C GENE AND FLANKMO REGOe. 

HUMM CYTOCmOME PMBO 4 GEIC EXON 7. AM) THREE ALU REPEATS 

HUMAN CYTOCmBK P4a0 1 FUNCTKMAL FORM MR»M. COM>L£TE COS. 

HUMM CYTOCHROME P^ 1 MMk OOMdETE COS. CUONE ^-2. 

HUMAN CYTOCmOME P«OC17{8TB«0 17-ALPHA4fY0ROXYtASEn7^ 

HUMM CYTOCHROME P4B0 PCW GE>C COMPLETE COS. 

HUIMN CYTOCMUME FMBO 0S1 IMM. COMPLETE COa 

HUMAN LIVB<GUJ C O CO RTICOK>B€)UC»L E CYTOO«WBCP^(>tP> 

HUMAN CYTOCMtOME P480BE1 (ETHAN0L-B<XJCI8L£)O£NE, 

HUIMN CYTOCHROME P-«ai MRML COMPLETE COS. 

HUIMN CYTOCmOBK P^ MFEOPMEOOOOASE MRML OOMPIET COS. 

HUIMN P«BO MRNA ENC00S4Q MFEDBI1NE OSXhASE, COMPETE COS. 

HUIMN CHOiESTBULSO&CHAM CLEAVAGE ENZYME P4BaSCC MRNA. 

HUMAN CYSTATMC (OVTB) GENE, EXON a. 

HUMM RAOMTED NERATMOCYTE MRNA FOR CYSTEINE PROTEASE 
HUMM MRNA FOR DOPMBME BETAHYOROXnASE TYPE A (EC 
HUMU« MRNA FOR DOPMNNE BETMflTDROKVlASE TYPE B (EC 
HMMN OMZEPAM BMOMQ BtBBTTOR (DM) MRNA. COMPLETE COS. 
HMMN HMM FOR DSL PROTOONOOGENE. 

HUMkN OBL ONCOGENE ENCOOBM A TfMNBFORMma PROTEIN MRNA. 

HUIMN SERUM VTTAMM 04MDINQ PROTEM (MISP) MRTM. COMPLETE 

HUIMN OeOXYCYTOIC KMMBE GDC COMPLETE COS. 

HUMM DTOMM 1 MRNA BCOOMQ HUMAN NEUTROPML PEPTCE I 

HUMM OEFBMM 1 PROTBN MRML COBTLETE COB. 

HUMM OEFBAM MRNA. COMPLETE COS. CLONE mP-1. 

HUIMN OEFBOM MRNA. COMHETE COS. CLONE 

HUMN OESMNOaC COMRETC COS. 

HUBMN MRML FOR OMYDROPTERDME REDUCTASE (HOHPRX 

HUlMNWmJMUPI B BOME REDUCTASE (HDHPR) MRNA. COMPLETE COS. 

HUIMN MRML FOR DOCNNG PROTON (BKMAL REOOQMTKM PARnCLE 

HUIMN TERMBML DEOOCYNUCLEOnOYLTTMNBFERASfi GENE. EXON 1 1 . 

HUMN OtVEROENT UPSTREAM PROTEM (OUP) GENE (OUQk COMPLETE 

HUIMN OYSTROPHN MRNA, COMPLETE COS, 

HUIMN V-OSA REUTH) EARB OEIC 

HUIMN V-B«A REUTB) EAA4 GEfC 

HMMN EPSTEIMARR VIRUS COMPLEMENT RECEPTOR TYPE ntCRa). 
HUMAN CR2«DB1iC»EPVTBMMR VMUB RECEPTOR GENE. EXON IC. 
HUMANCR*C021«30O»STE»**ARRVWUS RECEPTOR MWM. 
HUWN BETA-ENOCTHEUAL CELL GROWTH FACTOR (ECQFWA) MRNA. 
HUMAN ERYn«00 OIFFEHENTMTION PROTEM MVM (S)F)^ COMRXTE 
HUWH CBTfMOKX. 1 7 8ErAO»VOR0QENAaE GENE. COMPLETE COO. 
HMMN eOSINOPHL«ERn«> NEUROTOXIN (EON) MRNA. COMPLETE COS. 
HUIMN 0«aEP1« (PUTATIVE UGAHD OF BEIOOOMZEPINE RECEPTOR) 
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HUMEF1A 
HUW1AR 



HUMENQA 

HUMENOQ 

HUlCMR 

HUMERWt 

HUMBSAR 

HUMDWTI 



HUMOm 



HUMCn 
HUMCTM 



HUM^IA 
HUMTSM 
HUMFAV 



HMTACI 
HUkPAV 



HUMAN BJONOATION FACTtM EF-t-ALPHA GDC COMPl£TE COS. 

HUMAN mm ran ajONOATioN factor i mma «jmjmt (EF-i 

- LQROmH FACTOR (EOF) 

cqwoi miWAcioR waa tP r oR mwma. c ompiet e coa. 

HUMM MBVMMT (SHORT) BVOMAL GROWTH FACTOR RECBTOR 
HUMAN EAfCY OROVmi RBFOMi 2 PROTCM (BOn^ MMM. COMPIETE 
HUMAN TRMMiOIONM. MTUnON FACTOR (EIF4i AlPHA 8U6UMT 



HUMAN CLASTAME t MMM. COMPiETC COiL 
HUMAN OASTAtf ■ A QOS, DON & 
HUMAN BAVTMt ■■ HRNA, OOMUTE COi. 
HUMAN MNCREATIC ELAETMC lA MRNA. OOMPLCTE CIS. 
HUMAN RANCREATC ELAVTMI » WMA. OOHnro COa 
HUMAN BTTTHROI) laoraRM FROTBN 41 MRNA. OOMPlfTE COS. 
HUMAHSmLCTUW.PnorESI41 WMkCOIMETBCOa 
HUMAN MNCI«Tn OAST AM I MVM SeOUOCC 



HUMAN PRiPROOimAlMiL EXON 4L 

HUMAN ENDONEXM I IMM^ OQMPIETE COS. 

HUMAN ALPHA ENOIASC MRNA, OOMPlfTE COSl 

HUMAN KCUROfMPGOFIC QAMMM MKASe. OOMPl£rC COS. 

HUMM MRNA FOR ERA OtVCOPROrTEM (ERVT»««0OP0rrENT1ATM0 

HUMAN C€f»A MMA FOR THVROD HORMONE RECEPTOR 
HUMAN M fW AWRTWyRP HORMONE HBPPTOR. 
HUMAN ENCMON REMUR FROVEIt MMNA, OOMFLETE COS. 
HUMAN B« FROTEM (ETMEUTCD O0C) MRNA, OOMnCTE COS. 
HUMAN e«B OENE BCOOVn ERU FfWTGH OOMnfTE COS. 
HUMAN EmOQEN RBOTOR MRNA, COmfTE COS. 
HUMAN BrmMPOCTM QENE OMMTE cot. 

HUMAN BimffVJPOCTM QBC OOIMTE COS. 





HUMQFU 



HUMAN E>C0r>gUN»(B)N8X IMM. 



HUMAN BttO«HaM(STH a 
HUMAN ERnHROSUSTOSM VMJS ONDOQDC HOMOtOO t (STB-I) 
HUMAN ERfTHROaUMITOSM VMUi ONOOOENE HOMOtOQ a (CTM) 
HUMAK SffEfTMALFAfTV AGO BMOilQ mCHEMOENK OOMPIETE 
HUMAN UVn MTTV AGO MNDMO PROTEM {MIP) MRM^ COMFIETE 
HUMAN Ul«R FATTY AGO MDMQ PROrrEMjbMS^ MRNA. 
^KJMANMWNA FOR C O W' i l M O irOOWTROLFWIfcW FACTOR (. 
HUMAN OQAQUUTION FACTOR V MRNA. OOMFLCTB C0& 
HUMAN ratATMSOMO FROTBN (FW>> MRM^ COMFtETE COS. 
HUMW POUTMNWa PROTBN (HP) IRWk OOmXTE COS. 
HUMM RSPMOOai A1PHA«HMN MRNA, OOmETE COS. 
HUMAN RMNOODI AALPHACHAMMnNA,O0MPLXTE COtw 



HUMAN ra«>flNWECePTOR (ME RECEPTOR) 



HUMAN NRNA FOR H»4 AFFMTY HE RECEPTOR ALPfMUBUMT 
HUMN nSROOUST COUAODMSE RMBTDR MHM^ OOMPl£TE COS. 
HUMAN HON AFRMTV RCCEPTOR (FOR) n RMr. COMPLETE COS. 
HUMAN HQH AmMTV RECEPTOR (PCRO URNA. COMFIETE C0& 



HUMAN APOFERRmN H QBC EXMB »4. 
HUMAN FB«¥TW H CHAM MRNA. COmfTE C0& 
HUMAN FB»VTM »CAW tUBUNn^ mm. OOHaETE COO. 
HUMAN FB»VTM HEAW-GHNN ODE. E)CNB t, a AM) 4 
HUMAN FCRnmN LCMAM liVM, OOMPl£TE COSl 
HUMAN FERRmN UOHT SUeUNTT MRm.OOMn£TE COSL 



FRffROMBUUmJIC OROHrTH FACTOR I (BM HRM^ 

^QRWrniMCTOR I pSMI VAAANT 




HUMAN FACTOR I (C3aC4B MACTWATOR) Mn«^ COMPUTE COSl 
HUMAN PACTOR RfCHWTIMS FACTOR) HRM^ OOMPIETE COOMQ 
HUMAN FACTOR RfCPffWTMAO FACTOR) MRM^ OOMPIETE COOMQ 
HUMAN FACTOR 8t OM. OOMPiETC COSL 
HUMAN FACTOR 0C ODC EXONB 7 AND A 

H SiA N MRMA r ORt £ UW0 CY T E AWOCI AT B)M0mA£-IAlR»aU>afT 



HRIANMWMAFORnEnONKTiiRCCtPimiAifMAlURUMrr. 



HUMANOMVOROFOUTf RBWCTME OBC EKMCANOy FLANK 
HUMAN OMraROFOUTS NBKCTME 08C (DHFI% EXON S 
HUMAN PaUNTATMOBC tMON il 



HUMAN DMVOROPOUTt ROUCTAtE PBEUD0Q8C (PMMV 



SCOQBS IC*POS)i OOMPlCrB COO. 
OeOBSMraMSBOCHTI 



HTEDFROrOlh ^ 
HUMAN FOUJCIMTMUUrMaH0RM0ISKTAMUMraENE.E)KM 
HUMAN POLUaX VTMUUTMO HORMCN&BCTA ^114 ae«. BtON at 
HUMAN irrOONONORHL FUMRME WNIk COmSTE C0& 



HUMUl MCTOR VI S 

HUMAN COAOUUmOW FACTOR VWiC (AWTI I CMCIil RX FACTOR)" 
HUMAN FACTORMIMRNA. 
HUMAN OOAOUATON FACTOR VR« MRM^ COM\ETE COO. 
HUMAN FACTOR XpUOOOOAQUUmON FAGTOR)OEI« EXON 4 
HUMAN FACTOR » PUSOOOQAOUATICN FACTOfO MRNA, COMPLETE 
HUMAN FIAOBirM. FXMA MMA, OOMPIETE COS. 
HUMAN QMI«>1 OBC OOMPIETE COS. 
HUMAN OOOIM one OOMPIETE COS. 

HUMAN lflK H F RCITlWt l U MAS t CSUBaTRATEHN»IA.COWWEC0e, 
»«MANaLVC8MU»frDMPH00PHATEDEHraR00EM8EMRm, 
HUMAN qtTCTRAU O Iff C ■ FtCSMATE DCWOROgPMaE MHNA> OOMPIETE 
HUMAN QLVCWAtO O IW H PI WSWI A TBOCWOROOEIMOEMRMA. 



li y iQU J DO K IFIBSPII A rE0eM)ROQEMA96»ABWffA.,COMPtETE 
>UMN FOR «UJOQ»MHOBPHATE Oem)ROQENAS£ ^MPOK 
HUMAN IMNA FOR PMCRMT1C CARCMOMA MAmm aA733.l/^^ 
HUMAN OA i TROSfl tt liiALTUMOaAaSOCUTED AMTTQEN QA7M.I 



HUMOTIRA 



HUMANaAAMRNAFORLY80aOMALMP»«M»JUOQSKM8e(ACtD 
HUMAN QAU C TOSfrt^PHOSPMATl URRmTRANWEmSi fOALT) MRNA. 
HUMAN NEUR0NM.OROIVTN PROTEM 43 (OMM^ MRW. OOMPIETE 
HUMAN lACR MMA FOR OAP JUNCTION PROTESi 
HUMAN (MtTRMOBC OOMPIITI GOO. 



HUMAN OASTRMOeC OOMPIETE COS. 
HUMANIWER MRNA FOR KTAOUMJMT MONALTRAMOUCMQ 
HUMAN OROUMPecnC COMPONENT WTAMM D«ND(NO PROTEIN 
HUMAN OUCOCEREIROtCMEE MRm, COMPIETECOB. 
HUMAN QUJOOCEREIROOBMH OM OOMPIETE COS. 
HUMAN LVmOMALOLUOOCeOROOMS MRW. COMPLETE COS. 
HUMAN aUUCOCEREmOMMIK MMPM. OOMUTECOB. 
HUMAN QUUOOOORnOOO RECEPTOR ALAM kMM. OOlVLETE COS. 
HUMAN OUUCOOORTIOOO RECEPTOR SET A MRPM, OOtflETE COO. 
HUMAN lEUNOCm m RECEPTOR (FC4MMM40 MRNA, OOIMTE 
HUMAN ORANUbOCVTE OObOWr SRMUUTMQ FACTOR MRNA. COMPLETE 
HUMAN OEM FOR QRANULOCm OOUMr«TMtATWa FACTOR 
HUMAN IMNA FOR QRimilCTTEOOUMr^TMUUTMa FACTOR 
HUMAN MMA FOR QRANUtOCVTE OOU)NT«MUUTMQ FACTOR (OCSF) 
MJMAN IMR OUJf AMATE OefmOOeMSE MMV, OOMAETE COS. 
HUMAN OUfTMMTE OemiROaBMH (0014 MRM. COMPLETE COS. 
HUMANQUaMMTE OEMraROOENASE QOHI MRNA. OOMPIETE COS. 
HUMAN OUAL RMUART ACOC PROTEM (OPM>) MRPM. COMPIETE 
HUMAN SMjCRSROKAflTOROWTH FACTOR (VQP) OS KD^ 21 KO 
HUMAN FR CP ROSMUUWiJKEOHOWffMFACTORlCiaMlNRNA. 
HUMAN •0UUMJMeaROWmFACTOR(iaR)OeCE)eON4OF4 
HUMAN QROmH FACTORMWCMLE aAl OENE, OOMPIETE COa 
HUMAN ■WJUMM QRODVrN FACTOROOM) lAODa, EXON 4 
HUMAN MBUUHJnQROMrrHF«CrORaOQF«)CONATOURNA. 
HUMAN MBUUHAC QROMffH FACTOR (lOF) MNOVa PROTON WMA. 
HUMAN MSUUHXE QROWTH FACTOR ■ WMA. COMPLETE COS. 
HUMN iMJUMJKB gROMrm FACTOR BMDMB PROTBN BP-1 MRNA. 
HUMAN ■MULEUJIgQiWWm<»CrORWBCEWORMRI^ 



HUMANOAMMMUirAMVLTrMMPOTIMM (MO MRM. COMPLETE 

HUMANQAMMAOUffAMaTRAIMF lF l tJRUE (OgT)PRO IEM tWNA. 

»MiANQR{MrTNH0RMOM(HaHOOMAT0TR0PO0aBe. OOMPIETE COS. 

HUMANQROWrHHORMOWtOH-l MOflHa)AlfiCH0RnMC 

HUMAN OROmN HORMONE OM 9«H% OOMPIETE COB. 

HUMAN W NAFOROROmH HORMONE RECEPTOR. 

HUMAN ORQWTM HORMONE RBCVTOROeCOONia 

HUMANORawrHHORMO»SVMIMfr(HaMV)aDCA»OFlANKaL 

HUMAN AOMATWE OUANM NUaCOnOMMMO REOUUTORT 

HUMANqa FWOTBN AUiHABUWJNTqtNRE)CNIi 

HUMAN on PROTON ALPHA BUMMT OCNE. E»N B. 

HUMAN PREPROOABTRK BBWrORV POLVF^nOB (QV) MRNA. 

HUMAN MMA FOR on FROTBIMPNA«uaUNIT(ADamATECVaASe 

MMAN STMJLATORT Q FnOTON (OP RKGFTORMaUUTED lU 

HUMAN OMTTAOlJUOOBIMBfiCFROAeTIVATOf^MMA. OOMPIETE COS. 

HUMAN KTAOtUCURONRMBf MRNA. OOMPIETE COS. 

HUMAN MRNA FOR QU momi 

HUMAN aU/rAT»«OFC PEROgOQABE MRM^ OOMIETE COB. 

HUMAN T>CTA 1-aU0BMOB«. 

HUMAN QLUCAQON MRM. OOMfUTE COB. 



HUMAN MRNA FOR OUffAMATE OBfrOROQENASB (OLLXVn EC M.I.IL 

HUMAN MRNA FOR aU/TMME tVNTHETABB (E.C 4A1.ax 

WMAN (1CP0Q OUCOBB TRAMPORTER OENE MRNA. COMPLETE COa 

HUMAN QLYOOPHORM A MRNA. 

HUMAN OLVOOmORMB MRML COMPLETE COS. 

HUMAN MRNA FOR mmMOCVTl MEIflRANE OLYOOPHORM A. 

HUMANOLVOOPROVm ALPHAMUMT OM: ENON 4 MO FIAMM. 

HUMAN LMR QLYOOQEN FHOBPHORVIABE MMA, COMPLETE COB. 



HUMAN ORANUUXnrTBMACROPNMi OOLOMT BTMJUTWa FACTOR 
HUMAN QBg FOR fWM IB MCY Tl<iACROPIIAa i OOtOMrgTRKJI>^ 
HUMAN <MMimENUCI£Cm«iaBnFROr«IALPNA«aUNITOENE 
>MMNTRANHUCM«LFHA«unMr(QNAQIi«W,O0MFUTEC0Bl 
HUMAN HTPOTHALAMV MRNA BCOOBn 1MB FMOURBOR OF 
M »IAN ( I U ANB a NU L Ll U i aJB MN U M U FR0fTEBtjttO.AtPm)MRFiL 



HUMAN FliOELEraLTDOPROraMMHTACHMIMRML OOMPIETE COB. 
HUMAN PUraETaLYOOFROTCMBALPHAOMMMM^, COMPLETE 
H1.BIAN mXXO FUTEUT M R WRAI B aLtOOPROTBN >ALPW (0PM) 



BBKTAWOOPROTtBIDMRPBLOOMFLETB 

HUMAN F fW aM ANC Y-BPECWO lirAOLVOOPROTEBI EMfBM, CO^IETE 
HI liANIMI Mg jQLflBUUNOFCRBCCPTORMWMA. COMPLETE COIL 
HUMAN OROWfrHHORMOPrnnEABBa FACTOR (QRF)qENE.E»0N4 
HUMAN QRO(QR0«m«ReaUUnDl OflC 
MMANaABTRMRBEA«NaP0TttiMRNIL OOMPIETE COB. 
WBMNTBXnALTONQtUOOI WBaM U T tU FR U IHORPyiKKNE. 
HUMAN I BB IA FOR OOUPlJNaFWOTW CP) MPmoSSiT 
HUMAN MRNA FOR OOUPUNBPR0«BNR(B)AIPmS(«UNfr 
HUMAN MRNA FOR OUITATMOPa FERONOMBB (K 1.1 1 JJA 
HUMAN OUITHATMOPM FEROMBMB aoa. OOftPtETEC 

HUMAN OUrTATHOPSftTr 

HUMAN aUITATHONEB-TI 
HUMAN aU/TATHONEB-ll 

MMANOLUTXTVMaNBB-TRAIMPERMtNABUBUMrtlQBOMFML 
HUMAN QtJtfrim«»gTRMM F B^I t CU<BMU(QBTt)F^^ 

HUMAN BETA lAQAlACTOBVUTI ^ " 

HUMAN OUfTATMONl B-HMPRFl 
HUMAN liB« FORCtARB FtaumrmCNB B^Ti 




IfOBT) 



faBT)HAaUHMTlMRNA. 
HA BUT 



ItMM MRNA BTRMATORT anr^BMOSa FROTGN ALPm BUBUPfT. 

HUMANaL<W0PHaRMAaEICIMCM7. 

HJMAN OLVOOPHORBI B OeC. E)ttN MO ^ 

HUMAN ALPHA OtOBM OM OMTDiaN CHROMOBOMi Ift ZETA 

»ftMANALFHAaU0BMFBMfcFHA^.A&FHMAIOMPHA.1 QENEH 

HUMANKTAOLOBMReaONONOPMMOBOWII. 

W^H P A fM (B»«)BttOR0IWHF^CTOR1(WK>F.i|^E»0Nl 

WA<ANHE FARWO0FA CTORE(HC4)M^00m£TEC0a 

HUMAN HEMOPO^ CBX PROTEM-TYROSME KlMSe (HCN) OM. 
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HUktCKS 

HUftCU 

HUMHELB 

HUMHER2A 

HUMHEA13 

HUMHF10 

HUMHFSP 

HUMhNQF 

HUMHNQE 

HUlfttSlOa 

HUMH34 

HUMH»H2A 

HUI*f8H2e 

HUliHSHM 

HUMH»H3B 

HUI*f3H3C 

HUMHU1EA 
HUHHLABt3 
HUMHLABeO 

HUUHUOM 
HUMHLAOPB 

HUUHUUQ 
HUMHLASBA 

HUUHUXSeR 

HUMHLDFWB 

HUMHLDOWB 

HUMHUmiB 

HUMHMQ14 

HUMMQIM 

HUMHMQt? 

HUMHMQCOA 

HUUHyOtA 

HUMMIQtB 

HUMHMQYA 

HUMHUQYB 

HUUHMOYC 

HUMHMQYO 

HUMHMPFK 

HUI*C0B3 
HUMHOIM 
HUMHOOtSSa 

HUMHQX8 
HUI»«>1QS 
HUUHPAIB 
HUUHPA13 
HUUHPA2B 
HUMHPAB 
HUMM>AR31 

HUMHPRTB 
HUMHPSIM 
HUMHRAnC 

HUMtSR 
HUI*eC7D 
HUMHSOe 
HUMH8P27 
HUMHSP7D0 

HUMHTQL 

HUMHTHtR 

HUMIAIQt 

HUMWPra 

HUMAPfCt 

HUMCAMt 

HUMCAM1A 



HUMFN1SK 
HUMIFNAOl 

HUMFtMM 

HUMFNAA 

HUMFNAAP 

HUkUFNAB 

HUMIFNAC 

HUMFtMO 

HUMFNAF 

HUMRMFM 

HUMtFNAQS 

HUMFNAH 

HUMFNAI 

HUMtFNM 

HUMFNAIP 

HUUFNAM1 

HUMFNAN 

HUyFfMTA 

HUHFr4ATC 
HUI«F^MTO 
HUMFFMWA 



HUMFNBaS 
HUMFNB2R 
HUMIFNG 



HUMAN HEMOPCNEnC CBL PRCnTEM-TYROeME KMASE (HCK) Q0C. HUliFNN) 

HUMAN IffVU FOR Pf«TEMHC(AU44A-14iCnO(Uj0eUJfA HUUrMM 

HUMAN PAPUjOMAVMA(HPNO TYPE tBENOOONQPROTEMB El AND HUMFNRC) 

HUMANPARLijOMAVnU8(HPV)TYPEtePROrEmSEI^E7. HUMFP 

HUMAN TVnOeiNEKINASE-TYPE RECEPTOR (HER2)MRNA. COMPLETE HUMtOR 

HUMAN BETA-HEMOeAMNOASE BETA^UBUNTT (HEXS) GENE. EXONB 19 HUMK2FIB 

HUMANHF.IOOeCURNA. HUMtQFIIR 

HUMAN HANUKAH FACTOR SSVNE PROTEASE (HUHF}MnNA, COMPLETE HUMtOOFCRA 

HUMAN «PATOCYTE OPOmH FACTOR (H«F) ICVM. COMPLETE COS. HUMIQQFCRB 

HUMAN MRNA FOR MTTOCHONDfULMNQEPROTEM. HUMK>QRLAA 

HUMAN QEIC FOR HBT0NEH1(C^ HUM)QHAE2 

HUMAN mMSTONEQENE, COMPLETE COS. HUMOHBO 

HUMAN H»T0NEH2AQE»C HUMIQHBP1 

HUMAN HCTONEHae GENE. HUMIQLAM2 

HUMAN HBT0WEH3Q£>fc HUMIQLBV 

HUMAN H3L3HBTONE, CLASS BMRNA.COHPl£TE CDS HUMHRP 

HUMAN H3.3M8T0NE CLASS CMRNA,COMPl£TEC0& HUMUetO) 

HUMAN CALCIUMATPASE (HKl) MRNA. COMPLETE COft. HUMIFUA 

HUMAN CALCHJMATPASE (HK2) MRNA. COMPLETE COS. HUMIP 

HUMANHLA-ECIASSIMR»M. HUliU 

HUMANMHCCLASSIHLA4l3aEteC0MPl£TEC0& HUMLIAA 

HUMAN MHC CLASS I MA-BB CtMIN QENE (At 2 BS^S). COMPLETE HUMLl AO 

HUMAN MHC CLASS I HAflMMTOBC COMPLETE COS. HUULIB 

HUMAN HLA DM BETA-CHAIN MRNA, COiPLETE COS HUMIL1BA 

HUMANHLA-OPWAI GENEANDHLA-OPALPHA-I QENEENON 1. HUMLlBR 

HUMAN MA CLASS U Aim CHAIN OCNEOZ-ALPHA HUMILIBX 

HUMAN CUSS I VHC QENE HLA.eZ7KEX0«G 44 (BRLIQCaiLMEX HUHn.1P 

HUMAN HLA-SB(OP) ALPHA QE^C. HUMiL2 

HUMAN CLASS I UHC QENE HLA437WEX0N3 44 (BRUQCEIIUNE^ HUIIL2A 

HUMAN MRNA FOR HLM) CLASS It ANnOOl DO BETA CmH HUMIL2AB 

HUMAN MRNA FOR HLAO CLASS II ANTIGEN DPM2 BETA CHAM. HUML2B 

HUMANMRNAF0RH>OCLA3SIANT1Q£NDGMn.1BETACmM HUMOARA 

HUMAN MRNA FOR HUU) CLASS I ANT1QB4 0R1 BETA CHAK HUMURBC 

HUMAN NOMmrONE CHROMOSOMAL PROTBNHMQ-UMRI^k. COMPLETE HUMIURXB 

HUMAN N0N4«T0NE CHROMOSOMAL PROTEMMIO-1 4 GENE, COMPLETE HUMU^ 
HUMAN N0H4«TONE CHROMOSOMAL PROFTBN HMQ-1 7 MRNA. COMPLETE 
HUIMN >mDROXY-MIETNYUau;TARyL COENZYME A REDUCTASE MRNA, 
HUMAN HMG4 PROTEM ISOFORM MRNA (HMGI GEIC», CLONE K. 
HUkMN HkKM PROTEIN tSOFORM MRNA (HMGI GBC>, CLONE M. 
HUMAN HMG-Y PROTEIN tSOFORM MRNA (HMQI QENEH CIX3NE 2BL 
HUMAN HM&y PROTEIN I90F0RM MRNA (HMQI QENE)k CLOIC U. 
HUMAN HMO-y PROTEIN ISOFOfll MR»M (HMGt QENE), CLONE 10A. 

HUMAN HMO-Y PROTEIN ISOfORM MRNA (MUIGIGENEX CLONE t ID. HUMHJA 

HUMN MRNA FOR MUSCLE PHOSPHOFRUCTOKirMSE (EC 2.7.1 .ItX HUMtt^ 

HUMAN NUCLEAR ReONUCl£OPROTBNRARnCl£(HNRM>)CPROTBN HUMMP 

HUMAN »8ETA-HY0R0XYSTER0a>0EHVOROQENA3E QENE. EXONl HUMNCP 

HUMAN H0ME06QX QENE (CLONE HHO.Ct«K HUMMCP3 

HUMAN MRNA F0RCPl9H0MeOBOX FROM HOX-3 LOCUS HUMtNHA 

HUMAN HOMED BOX CI PROTEIN. MRNA. COMPl£TECO& HUI0>tMO2 

HUMAN HOMED BOX Ca PROTEIN, MRNA, COMPlETHCOa HUMmHBA 

HUMAN HAPTOQLDBW(HPIF) QENE, EXON&. HUMMF1 

HUMAN HAPTOQUOSIN AU>HA(1S>WA PRECURSOR MRNA. HUMINSOl 

HUMAN HAPTOGLOON ALPHA-1S(HPA1S» MRNA. COMPLETE CfB. HUliNSPR 

HUMAN mPTDGL0eiNALPHA(2FS>WAPflECURSOa MRNA. HUMNSR 

HUMAN HAPTOQL0eiNHPALi>HA4 MRNA. COMPLETE COS. HUMINSRA 

HUMAN MFTOGL0eMQE>C (ALPHAS AU£1£). COMPLETE COS AND HUMMTIQ 

HUMAN DNAWTTN A HEPATITIS B VIRUS SURFACE ANT)GEN(HBSAO) HUMmV2 

HUMAN M>QI MRNA BCOONQ BONE SMAa PROTEOGLYCAN I HUMRSP 

HUMAN mPTOQLOeitfREUTEO PROTBN QENE. EXON S AND V HUMtROT 

HUMAN HypaXANTHMEPHOSPHORIBOSVLTRANSFERASEOfRnaBC HUMSG2 

HUMAN HEPSN MRNA, OOMPlfTE COS. HUM8QA2 

HUMAN RETINOCACD RECEPTOR GAMMA, COMPLETE COS. HUMSK 

HUMAN MSnnNE-nCHGLYCOPRGrTEIN MRNA. COMPLETE COS. HUMflQ 

HUMAN MRNA FOR HBTDYL-TRNA SYNTHETASE (HRSjw HUMJQEBFR 

HUMAN HSC70 GENE FOR TtKO HEAT SHOCK OOOmTEPROTEN. HUMJUNA 

HUIMN3BETA4IY0R0KY-fr£NESTERO«}0EHirDROGENASEMnm. HUMKAO 

HUMN QENE FOR 27M>A HEAT SHOCK PRarBN(HSP 27^ HUMKA12 

HUMAN HEAT SHOCK PROTEM(HSP 70) QENE, COMPLETE C0& HUMKERia 

HUMAN MH(DA HEAT-SHOCK PROTBN GENE, CONA, COMPLETE COS. HUMICRIO 

HUMAN 90 KO^T SHOCK PRCFTEMQDC. COMPLETE COa HUMKER2A 

HUMAN TRANSFORMINOPROTEM(HST) GENE, COMPLETE COS. HUMKERBa4 

HUMAN MRNA FOR ^ATIC TRKSLYCERCE L^ASE (HTQU. HUU(ER»73 

HUMANMRNAFORTYROSME HYDROXYLASE (HTH-1)l HUMCERa 

HUMAN lA^ASeOCIATED MWAfiCMT O AMIM OWN QENE, EXON H HUMKERCIS 

HUIMN lAPPOeC EXON 9 FOR ttLETAMYLOOPOLYPEPTCG HUMCEREP 

HUMAN IAPPMRMLR3RnLETAMnonPOLVT>EPTDCPRECUn90R HUMKEREP9 

HUMN VrTBCELUUlAR ADHESION MOlECULE-1 (ICAM-1) UPHf^ HUWtERB 

HUVWU4MAMR GROUP RWKMnuSRECOTCft(»ff1tf)MRm, COMPLETE HUMKIN10 

HUMAN WSUUMOtMHttAI^LOQMfiMENOOPMQOIM-eWDWIO HUMKRTlOA 

HUMAN tMIIHJGaRAOINQBeyMEmfemNA. COMPLETE COS. HUIKUANT 

HUMANtfrERFER0N4OUCEDl7-KIM/lMaMPn0TEmMRNA.C0MPL£TE HUMLACTA 

HUMANINTERFBt0NALPmQENEFMAU>milL0OMPL£TECD& HUUtACTAL 

HUMAN INTBFERON ALPHA OBCFNALAM 7. HUMLAM1B 
HUMAN INTERFERON ALPHA GBC FHALPIM 4B. 
HUMAN LEUKOCYTE NTERFERON (LET) ALPMW^ QBE. 
HUIMN LYMPHOCYTE PREWTERFEnON MRM, AlPm TYPE aoi. 
HUMAN LEUKOCYTE NTEnFEnON(FHALPm)ALPm4MFMA, 
HMMN LEUKOCYTE MICRFEnON(IFHALP»M) MPmC MRML,COMPLETE 
HUMAN LEUKOCYTE MTERFERON (rHAlPHA) ALP»M QBC 
HUIMN LEUKOCYTE MTERFEnON<mALPm) ALWA^ MRNA, COMPLETE 
HUMN LEUKOCYTIC iOERFERON ALfH^ (IFHALPm^ MRM, 

HUMANLEUN0CYTEINTERFB«M(FHALPHA),JCL9HQENE& HUMLAPA 

HMMN lEUNOCYTGINTBVEnON(IFHALPm)ALP>«44 MRNA, COMPLETE HLAAjCI 
HMMN LEUKOCYTE MTERFERON (RMPW) ALPHAS QEIC 
HMMN LEUKOCYTE INTERFBION (IFMALPm)ALPHM QBC 
HMMN MTERFBIOMALPHA CLASS I (HUFHALPHA«-1 ) QBC 

HUMANffTERFEnOMAIPmTYPEr MRNA. COMPLETE COS. HUUCAT 

HUMAN NTERFEnON(FMALPm441) MRNA. COMPLETE COS. HMAjCATQ 

HUMAN LEUKOCYTIC i(rERFEnQNALPHA4«(lfHALP»M4l)MR*M. HUNLCTHA 

HUMAN tNTERFBON ALPHA GENE IFNALPHAK COMPLETE COa HUMLCTNi 

HUMAN MTSVERONALPMOBCrHALPW II COMPLETE COS. HUMUyTM 

HUMAN MTBFERON ALPHA OBCFHALPHAg^ COMPLETE C0& HUMLOHA 

HMMNNTERFERONALPmOOCFHALPmai HMUHA7 

HUMAN iffEWFEWONALPm W AQBC COMPLETE COS. HUMLDHB7 

HUMAN nBnoaLA«rMrcnFcncM(FiMErA-i)aE>c AM) FiAMca. HMUtmti 

HMMN INTERLEUKMt MRNA, COMPLETE C0& HMAOKX 

HMkMNHYBfliOOIMQROWrTM FACTOR (MTEREUMN^MRMk COMPLETE HUMLOLIOO 

HUMAN WTERLEUKHamM. COMPLETE C0& HUMLOLAlt 

HUMAN MIMCINTERFEnQN(tFH^3AMMA) QENE AND FLAMS. HUMLOLRRL 



HMMN INTERFERONHNOUCtBLE MRNA RMQMENT (CDNA 

HUMAN QAMMMNrERFER0M4M)UCtBL£ EARLY-RESPONSE QEfCP-ia 

HMMN NTERFBOfQAMIM RECEPTOR MfVM. COMPLETE COS. 

HUMAN 4(M(DA tCRATM INTERMEDIATE FILAMENT PRECURSOR QENE. 

HUMAN ieUUHUXE OnOWTH FACTOR MRNA. COMPLETE COB. 

HMMN OF-I GBC EMON 5 FOR MSUUN-UKE QROMON FACTOR I 

HMMN MRNA FOR NSULME-LJKE QROWTN FACTOR N RECEPTOR. 

HUMAN LOWAFFIMTV DO FC RECEPTOR (ALPtM4SC4MUIMA-Ra) 

HUMAN UMVAFFIMTY 100 PC RECEPTOR <BETAfC4MMMfli) MRMk, 141 (BP 

HUMAN tow AFFMTY OO RECEPTOR CDt« (FCORII) MRNA, 

HUMAN n ACTIVE ^VY CkWH EPSUjON-I QENE. CONSTANT REOKM. 

HMMN UM>R0OUCTIVELV REARRANGED n M>CHAIN MRNA V-REQION 

HMMN REARRANGED AND TRUNCATED 10 OAIAM HEAVY CHAM DOEASE 

HUIMN 10 REARRANGED LAMBOM^CHAM MRNA UIC4C0KM SUSQROUP 

HUMAN MRNA FOR O LAMBDA L<»Mm V{Wh»C. 

HMMN PLACENTAL RiSONUCLEASE MfBfTOR MRML COMPLETE COS. 

HMMN INTERFBK»NMXJCtil£ M M) PnOTEM MVM, COMPLETE COS. 

HUMAN INTBVEROMMDUCaLE GENE FVaM( REGION (» KDA 

HMMN 0AMMA4NTERFER0M#OUCaLE PnOFrEfN(IP^) MRNA, 

HMMN MONOCYTE tHTERLBMN 1 (K^l) MRM, COMPLETE C0& 

HUMAN INTERLEUKM 1 ALPHA MRVM. COMPLETE COO. 

HMMN QENE FOR INTERLEUKMtALmA(IL-1 ALPm> 

HMMNMTGRLEMON-I BETA QENE. COMRETE COS. 

HUMAN INTERLEUKM 1 BETA MRNA, COMPLETE COS. 

HMMN MRNA FOR tNTERLEUKM 1 BETA 

HUMAN OXTERLEUKINI BETAQEtC 

HUMAN tNrraLEUNN-l (IL-1 ) MRML COMPLETE COS. 

HUMAN INTBVEUKtN 3 («^) GENE. COMPLETE COOINQ SEQUENCE 

HUMAN MTERLEUKM 3 (t.-2} QEIC COM>LETE COO. AND FLAMONG 

HUMAN NTERLEUMN 3 GENE. CLONE PATTACIL^CATT, COMPLETE 

HUMAN INTEPLEUKIN 2 {M.-2) QE>e COMPLETE. 

HUMAN MTERLEUKHM RECEPTOR MRNA (LONG FORM]i COM>LETE CDS. 

HUMAN tNTERLEUKM 3 RECEPTOR BETA CHAIN (P70-7S) IMM 

HMMN tNTERLEUKM 3 RECEPTOR GENE. EJON & 

HUMAN MTERLEUNN 3 (L8) OENE, EMON 4. 

HUMAN tNTERLEUKM 3 CIL^) MVM, COMPLETE COO 

HUMAN tNTERLEUKM 3(IL/9» MRM^ COMPLETE COS, CbONE 

HUMAN MTERLEUKM 4 (lU) MNA. COMPLETE COO 

HUM^N tNTERLEUKM 5 (I.-6) QENE, COMPLETE COO 

HUMAN EOSJNOPHL DtPFBCNTUTION FACTOR (MTERLEUMN 5) 

HUMAN MRNA FOR T-CELL REPLACINO FACTOR (MTERLEUKM^X 

HUMAN INTERLBJKfN • RECEPTOR Iff*^ COMPLETE CDO 

HUMAN tNTERLEUKM 7 (tL-7) MRML, COMPLETE COO 

HMMN MRTM FOR MTERLEUKM BSF4 (B-Cai DtfFERENTMTKSN 

HUMAN INOSME-r-MONOPHOSPHATE DEHYDROQENASE (IIP) MRNA. 

HMMN CYSTElNE-PROTEIfMSE MM BfTOR (CSTi ) QENE. COMPLETE 

HMMN CYSTEINCPROTEMASE MMBITOR (C8r3)GEIC. EXON 1 

HUMAN IftfBW A^UBUMT MRML, COMPLETE COO. 

HMMN PREPROtMBIN A QENE EXON 3 FRAQMB^ MACROPmOE CEa 

HUMAN OVARMN BETAA INHBIN MRNA. COMRfTE COO. 

HUIMN IKTERFEROH-MDUCIBLE QENE FM4K SfLANK 

HUMAN INSUUN QENE INCLUDING V AND y FLANK& 

HUMAN ALPm-TYPE INSULJN QENE AND 7 FLAMdNG POLYMORPMC 

HUMAN INSULIN REC0>TOR MRNA, COMPLETE COS. 

HUMAN INSULIN RECEPTOR MRNA. COMPLETE COS. 

HUMAN MT-I MAMMARY ONCOGENE. 

HMMN INVOLUCHIN GENE. EXON Z 

HUMAN INTERPHOTORECEPTDR RETIN0ID4M0M0 PROTEIN (RBP) 
HUMAN MSUUN^CSPONBWE GLUCOSE TRANSPORTBl (aU/r4) MRNA, 
HUMAN BU4K GENE ONTBVERON STMUATB) GENE) ENCOOMO A 
HUMAN MTERFEROMMUCED 15^ PROTEM (BO) GOC EXON 1 
HUMAN BK PROTEIN (EXNBmNQ A SLOWLY ACnVATMQ CHANNa 
HUMAN MRNA FOR SECOND PROTEIN OF MTERALPHA-TRYPSm 
HUMAN M»M FOR LYMPHOCYTE lOE RECarTOR (LOW AFFMOY 
HUMAN CJUN PROTD ONCOGENE ENOOOMO JUK OOMaETE COO 
HMMN BRANCHEDCmm ALMA^ETO AOO OefyOROQEfMSE (E2) 
HUMAN GLANDULAR KALUKREM QE»C COMPLETE COS. 
HUMAN KERATM 1 • (Kl «) Q0C. COMPLETE COO 
HMMN mNA FOR KBMTM 1 a 
HUMAN KERATM TYPE I (66 KO) MFVM, COii>LETE COO 
HUMAN DNA FOR aan) KBMTIN TYPE I EXONB l^7A> AND 
HUMAN EROERMAL 67-iaM fCMTIN GENE. EXONB 7. 6. A»0 ti 
HUMNCYT0ICRAT1N 6 IffVM, COMPLETE COS. 
HUMAN MRNA FOR CYT0KERAT1N 1 Bi 

HMMN 50 n) TYPE I EPOERMU. ICRAT1N QEIC COMPLETE COO 

HUMAN Kea (EPIDERMAL KERATW. TYPE N) GENE. EXON ft 

HUMAN POT. PSEUDO^OMTM K1 • TYPE I , EXON 4. 

HMMN KMNOGEN GE»C. EXON IOl BCOOmO BMOVVaMN AND EXON 

HMMN ACtOC KBMTffflO MRNA. COMRETE COS. 

HMMN NU AUraMMMC ANnOEN QENE. COMPLETE COS. 

HUMAN ALPHALACTALBUMM QENE. COIMETE COO. 

HUMAN PRE-AIPW4ACTAL BM 0N MRML 

HMMNUUMNM B1 CHAIN MRNA. COMPLETE COO 

HUMAN MRm FOR NUCLBM EWEUQPE PROTCM LAMN A PRECURSOR. 

HMMN OOLJN CARCINOMA LAMMHOmOMO PRCrrBN MRNA. COMPLETE 

HMMN LMiNM Bl CHAM mM, OOtfiLETE COO 

HMMN LAMm C MRNA. COMPLETE COO 

HUMAN MRNA FOR NUCLEAR ENVELOPE PROTTEM LAMM C RRECURBOa 

HUMAN LAW* 1 MRNA ENCOOINQ LYSOSOMAL M CM S n AN C 

HUMAN LAMP4 QBC BCOOtNG LYSOOOMM. MOMMNE 

HUMAN LEUKOCYTE AO^CSION PROTEM (LPA-IAMC-1>P1 W.m 

HUMAN LEUNDCYTE AOHESKM RECCPTOR ALPm SUOMflT MRNA. 

HMMN UPOOORTINM IMM, COMPLETE COO. 

HMMN UPOOORTM-V MRML, COMPLETE COS. 

HMMN MRm FOR T2Q0 LEUKOCYTE COMMON ANTIQEN (C048, IM^ 

HUMAN MRNA FOR LEUKOCYTE COMMON ANTUCN (13001 

HMMN L£Cmfl»4CH0LEBFTER0L ACYLTRANBFETMSE MRNA. COMPLETE 

HMMN QENE FOR LECmtHCHOUSTEROLACVLTIMNBFERASC (LCAT^ 

HMMN LYMPHOCYTE CUT>«m UQHT'C»MIN A MR»M. COMPLETE COO 

HUMAN LYMPHOCYTE CLATHRM LJGHT'Cmm B MRMk OOimETE COO 

HUMAN TQNSILIAR LYMPHOCYTE LD7i MRNA MDUCGD BY TPA OR RM 

HUMAN LACTATE OEHYOnOQENASEA BOKYIC MRNA, COMPLETE COO. 

HUIMN LACTATE DCHVDROOBM8CA QENE. EXON 7. 

HMMN LACTATE OCHVDROQEfMSC B OCrC exON a (GC 1 .1 .1 

HMMN WMA FOR LACTATE 0EHVDR0QENA9E • (lOHO^ 

HUMAN TESTMPGCIFIC LACTATE OEHrDROOENAeC (10HC4. UHX) 

HUMAN MRNA FOR APOUPOPROTEM B-lOa 

HMMN LOW DENBTTY UPOPROTEM RECEPTOR QENE. EXON 1 A 

HMMN MRNA FOR LOL-RECEPTOR RELATED PROTEMl 
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HUML£C 

HUHLfUKOB 

HUMUITM 

HUIM 

HUMMM 



HUMAN 14 KD IKTM MRMk OOMPIfTE C0& 



HUMUPH 
HUMUOM 
HUHUMM 



HUMM uvBiQunoiETfMNMiTmM moroN (oujrax 

HUMAN LUTBMZMa HOIMM f»£A8M HORMOM am li^ 
HUMN UPOOOfffM I IMM. OOMFLCTC COS. 
HUMAN MHM ran L#000mWL 
W MANI g > AT 1Ct# A I C IWNA.O0MWJTECI». 
HUMM l£UMOri<M HimnOUaC MIMA, OOMAfTE COS. 
HUMAN LIUNOnVM HVDROIAK MIMA, OOMMG COS. 
HUMUl LYmOMMMCIATED MTlWMNi OLVOOmorEM (LMIP A) 
HUMAN LVMm NOOt HOMMQ fCCB^ MMA. OOMPl£TE C0& 



HUMLPOH 
HUMIMA 



HUMMI B4JP0RVaCNAIf MRNA. OOMPLETC COS. 

HUMAN UPOAMDi OemnOOBMC HRNA, OOMPtCTC COS. 

HUMAN UPOraOflM UMaE MRNA, COyPlfTC COS. 

HUMAN MMNA ran L#OmorrBN URASCfK «.1 

HUMM LVKSM IMMk OOMAETE CnVffTH AN AUJ fTCAT M T>C 

HUMAN LVBCZnS IMMk OOMPIETE COtL 



HUMLT 

HUMLTX 

HUMLYL1A 

HUMLVI.1t 

HUM.VN 

fMWMPn 

NUMMACIA 



HUMM LVMPHOrOKM MWA, OOMPtETE CO*. 
HUMN LVMPHOraom (LT) IMMk COMPLETE COB. 
HUMAN LVI^I PnOTBN MMrk OOMPIETS COi. 
HUMAN LYUI PROrCN Q0C OOIMETE COS. 
HUMWI LVN IMM ENOOOMQ ATYUME KMA8E. 
HUMN CATIOIMmoan' MMMOaCMHOaPNAT&aPECinC RECEPTOfl 
HUMN MAC-1 OEIC ENOOOMQ OOMABAfT RECEPron TYPE C01 1 a. 
HUMAN MVEUMMMCIATID O L ytO PH CW EJ N(MAa) IMNA. COMPLETE 
HUMAN MM. PfWrON OaC MRM^ COMPLETE COB. 



HUMMHEA 



IFA 
HUMMPIA 



HUMMLCIP 



HUMM MVBM MMC PROreN (IMP) UIVM. COMPLETE C0& 
HUMM MONOCYTE CHEMOTACTI FACTORC AND ACTTVATMQ FACTOR 



HUMMHA2 
HUMMHA9 



HUMMCtMV 
HUMMHCA1A 
HUMMHCAMA 
HUMMCACA 



HUMMHCant 

nc 



HUMMH0C1A 



HUMAN MEnrrUMUMVLOOAMITAflE ailCM) MRNA. COMPLETE COa 
HUMM MMBM^OORTIOOI^^ 

HUMAN P<K.YOOPRarEM (MORI) MRNA, COMPLETE COB. 

q t VtXJP RUm NPa«W)MPNA.COMKETECD& 

HUMAN MRNA FOR MCROBOMAL B^OOODC HffiROLAflE (GB 

HUMN MRNA FOR HCROBOMAL mOOOE HVDROLAK (GC XXZn 

HUMAN ISTAliOrMONDIM QEW (MT^ AND FLAMA. 

HUWN HETAU0THONEBHI PBCUDOQENE (MT-VB^ 

HUMAN METAUOTMOICiHA OEIC COmETE COOMQ aCQUBCE. 

HUMAN MCrALUmtONBi^COENE (MffT-C). 

HUMM UBTAUOIMONDHFOeS OMT^ 

HUMAN METAUJOrHOmi 0klT1»# QEIC. COMPLETE COB. 

HUMAN MET PROroCNCOOeC MRMk COMPLETE COB. 

HUMAN MVOQUMM OENi, OOMPIETE COBi EXQN 3L 

HUMAN MVOOUNM OeC EMN & 

HUMMI MATnXQU PRCrrCN (MOP) MMA^ COWLETE COB. 

HUMkN mUNOLVOOQENPHOBPHORVLABEMMA. COMPLETE COB. 

HUMAN OAIB I TRMMPUNT A710N ANHQEN (HU) QEIC 

HUMM MHC CUBE N COMPLEMENT CCMPOieiT Ca MRNA. COMPLETE 

HUIMN MHC CLAM I HLA^MOEIC. COMPIETE COB 

HUMM tMCCLABB I MAMOBS. 

HUMMI MHC ClABB I HUWO QEHE, 

HUMMMHC OMB I HUUX>AU>NAQENE (0R4^W»). EX0N9 iJiA 
HUMAN MHC CUBB I HUMMM QENE 
HUMtfl MHC CUBB I »t*«1» tUVTVPE Bill (ORENTAL) ODC 
HUMAN MHCCUBB I HA MAM OPg. COMPLETE COa 
HUMAN MHC CUBE I HU^CEIL MBMRANE OLVCOPRCJTEM QEIC 
HBAPKNi MC CUSB I ttAWOBC EXON 7. 
HUMAN MHC CIABB I WAC.1 OBC COMPLETE COOL 
HUMAN MC CLMB I MAA1 CHMN OEIC (A1 A BUI COMPLETE 
HUMAN MHC CUMB I MAA14A CHAM MRNA tA24A, BM^ CMn U 
HUMM NHC CUMB I WAC^UHM CHABI AM) AUBMATI^ MRMk 
HUMAN AimUMC Q0« FOR MV08B«»CAW CNA«l(»frTEn«NUBK 
HUMAN MHC CUMB I »tA«19 CHAM QEIC iAMML^aO: Bt Ut V 
HUMAN IMC C&ABB 1 WABt4 CHMN GENE («W B14ir: CVeX 
HUMAN MHC CLM I WAait CHAM OeC OWOU BlIlWM : CMS), 
HUMAN ftHC CLAN I HU^B44JCHMNQENE(A1 B44,««7: 
HUMAN ftMC CLABB I MAB4ACHMN OEM (A2M A B4a4»X 
HUMAN MC CtAW I HUkMi CHAM QENE (MMBUOt iBH^X 
HUMAN MHC CUBB I WABMM IMA (AUA; BMMt own U 
HUMAN MC CLA« I H ABWM CHMNOENEtAWOMC BII^WMI; 
HUMIN MHC CLAM t »ftA«MHa CHMN CM (AMMSOt BMM%WU)L 
HUMM MHC CLMB I WACB-I OEIft COM>l£TE COB 
HUMAN MCCLMB I MACfra OCN^ OOIMETE COB 
HUMAN MHC CUBE I WACNU CHAMOEHE (NMt KT. CMZ^ 
HUMNQBC FOR CLABB ■ PWAnANT OAMIACHAM (EMON 
HUMM frEnOE»l4miROKyUME p4MtCil I B GENE. COM>LETE 
HUMAN 8TB«30 ai4miR0RVlAaE BOBM. COMPLETE COB. 
HUIMN WC ClAM ■ KMNE E8TERME (HmniBMA. COMAETE 
HUMAN MHC CUBE I MACWI HEAW CHAM QEIC. COMPLETE COai 
HUMAN UHC CLAM I WACMn Q0«. OOMPIETE COB 
HUMAN MHC CLMB I WACMfilCAW OMM OEIC COMIETE COB. 
HUMN MNC CLAM I MAOMS OeC COMPLETE COB 
HUMAN mC CLAM I MAOM OeC. 
HUMN MHC CLAM I MACMO OeC COMPLETE C0& 
HUMAN MHO CLAM I HUUmn 1«TA4 CHAM (Onm 1 J» QENE 
HUMAN MHC ClAM ■ MA0C1 -AUiHA OBC (pRRHWn HRML 
HUMAN WC OAM I HLAO&METAQEW <DRUK 
HUMAN CLAM I HNrrOOOIMTMirv ANTUet O&AtM CWW 
HUMAN MHC CLAM I MAOC^TAOEIC (DWMOW) AM> FLAMCS. 
HUMMI IBCCUM I DNMPHA MRNA, OOMAETE COB 
HUMAN MHC CUM I MA DMCTA OENB (OR«OMLDmM 
HUMMI MHCC&AM ■ DOKTA ftMik COmETS COB 
HUMAN IB» CIAM I WAOOAIMA BENE (DRUMH EMNB tau 
HUMAN MC CUM • DO AIPHA MWf^ COMPLETE COa 
HUMAN MNC CLAM ■ MADMPHAtOOMqi OOIMETE COa 
HUMAN MHC CLAM ■ MAOOBETA MRM^ COMPLETE COtL 
HUMAN HHC CUM ■ DOMTA AMOCWrCD WfTH DRi. DOM 
HUMAN MHC CLAM a DMBTA AMOCMffED WITH DR1 , DOMn 
HUMAN MHC CLMB N DO HTA MRNA, COAPIETE COB 
HUMAN MHO CUM ■ HUMKMETA (ORMMMftOM OQMRI MRML 
HUMAN HHC CLAM ■ HLA4X>BEr A (OOMBX OOHPIETE COB 
HUMIM WC CUM i HU ORBETA CHAM IBMA<OR)) FROM CELL 
HUMAN IBC CLAM ■ »tA ORi METACHMN MIMA, COMPLETE COB. 
HUMANHLAORANmBIALPHACHUNMnNAAA 



HUMMOT 
HUMMPO 
HUMMP07 



HUMMT1BI 

HUMMT2A 

HUMIVC3L 

HUMfVCaU 

HUMfVCC 

HUMMVCOT 

HUMMVCE) 

HUMMVCFS 

HUMfVOai 

HUMfVCUA 

HUMMVCM 



HUMMVCRT 

HUMfVCTM 

HUMMVCTR 

HUMMTUI 

HUMMVLCA 

HUMMVICB 

HUMmCC 

HUMMVLVI 

HUMMV0L1 

HUMMTP 

HUMCA 



HUMCF1A 

manj 

HUMNFM 



HUMNQFR 



HUMNMirC 
HUHMffCA 



HUM»^ 
HUUNPVA 



HUMQAT 
HUMQATM 



HUMOOC 

HUM0P8 

HUUQTC 

HUMOrCIO 

HUHOnfil 



HUHP591I 



HUMM IBCCLAM I WAWWttBOClATB) OLVtUPWIIbM KTA- 
HUMM HLMM ALPHACHAM MRNA. 
HUMAN WC CUM I DRBETACHAM WBM ^WR^ CUONE 
HUmN MHC CUM I DMETACHAM MNA paiW% CUOM 
HUMM IBC CLAM a DR KTA MRML COIMTE COBl 
HUMMNHCHUCLAMIOMETA-t (DBft4)MRML 
iaWAD R W Rl ABBO CI A T I 



ATB> OLVCOPRCrrEM BETA- 

HUMM MAOR MTOEN KTM MRNA. 
HUIMNMHCCLMBNLYMPH0CVTBANnQCN0fWMU>HA.1 QENE, 
HUMM IBC CUM ■ MAORBCTA M»M ftRMOWMim OOWSi 
HUMAN M»C CLAM I nAUMffia IMA DOWI-BBr^ OOMPUtTE 
HUMM kBC CUM I HU0RBETA1 MMA. COMPIETE COa 
>AaiANIBCCUMaHUCRB6TAglNMA.LUPilli:C0B. 
HUMM IBC CUM a ANTUOi HLAOR AtmA »CMrr CNABi 
HUMAN AAOR ANHQEIMBBOCMTB) a«VAf«ANT CHAMB P33 AND P95 
HUMAN IBC CLAM a LYMPHOCYTE ANnOCN DnM4ErA>1 QEfC 
HUMM* MAOR AlPHACMAM (CHMRPM) QEPC ENOM l; Ik 4 AM) 
HUMAN MC CLAM a HUORBETA (DRMnnnOM D0M9» MRNA. 
HUMM MHC CUM i MAtfMnMETA MRNA, COMPLETE COB, CUME 
HUMAN MAOB ALPHACHAM MNA 

HUMM IBC CLAM 1 MA«9UiPHA OENE (ORMM)^ ENON 4 
HUMM MHC CUM I LYMPHOCYTE ANTttEN 

HUMAN MMOR IMTOOOMMCnBEJTY CUM a ANTttEN OAMMA CHAMl 
HUMAN MCI MRNA. OOM>iETE COB. 

HUMM MMRATION BBWrORY FACTOR (MT) MRNA. COM>l£TEC0B. 
Wmii aOBIM MHMA. OOMPtfiTE COBl 

HUMAN MUEliBBAN NBmNO BUBBTANCC QEIC COMPLETE COB. 

H UaMNM re WFORMYOBMUOMTCHABU (IBJD-1F> 

HUMM MRMk FOR «NmCUAR MVOBM UOHT CHAM a. 

HUMWI a»«lA MfOBM UGHT CHAM (MLCa) IMB^ COMPLETE COB 

HUMAN MRHA FOR MYQBM IMHT CHAM • (MUMFX 

HUMAN ALXAU MVOBM LJOHT CHAM 1 . OQMUIt COB. 

HUMNAUMUMVOBMLMlKrONAMIlOOMPIETECOa 

HUMAN MATRDC MET ALiCPROrrBNMBa (MM) IMNA. COMUTS COB. 

HUMAN MRNA POR MN aUPBIOnOE OMMUr AM (BC t.iai.LX 



HUMAN MRNA FOR MOHLM PRECURMA 



HUMAN MMA FOR 
HUMAN CATWMB aJ CPCN DCW r 



Nil 

(eci.ii.i.T^ 

MAMM BPHOBFHATE RECEPTOR MRNA. 
(CCMJCII). 



lATION MMmJRY FACrORRaATB) PROTEM 1 4 (MRP14) 
HUMAN MRNA FOR CAiaUMBMOan PRCraMM MACROPHAOa |MRP«) 
HUMM MUMTION MHVTORY FACTORREUTED PROTEM a (MRPa) 
H ,iMN M ff M FOR IMN T^ N E M aUH JMUU E OMMUTAM (BP 1.1 A.1 .1). 
HUMAN MET ALLtmaONBN M QBC EXM a 
HUMAN METAUOTHONOHB OEM (MT-MlL COMPLETE COB. 
HUMN l^MfC PROTON QBC COMPLETE COa 
HUMAN (BLJa TRANMOCATB) T(B14) CMTCONOOaeC ENON tAlt) 
HUMN (UMM) &MVC PROTOCNOOOeC COMPUTE COOBO flEOUENCE 
HUMAN (DAUOa TRANnOCAT8)Tpi4) C4m ONOOQDC MRNk 
HUMN (ACMMi>C4frC PROFOONOOaENa ENQN a AND FlAMa 
HUMN FETAL UVER C4ffVC PROrOCNOOQENC ElttN a AM) FLAMC& 
HUMAN tCa4 OERMLME CMVC PROrCMMOOaDC. E)(ON a AND a 

HUMAN mcLioeK, comETE coa 

HUMM (NB«B)e«IVC PROrOCNOOQeC MRNA. 

HUtMN &MTCPM MRML BBTIATBa FROM PROMOTER Pa 

HUMAN (MJI TRANaUlCATB) n«14) C4frC ONOOQCM. COMPLETE 

HUMMIRANBUXATOMBBOCMTB) MVC AUSLI OP MCMLS ONOOQENE. 

HUMAN UNREARMNaCD MVC AUEIE OP anW^ONDOOBM, COMPLETE 

HUMN MVOOM AIXAU UQHT CHAM (ATRMUMRN^ OOMPUETE COB. 

WMAN lOBB TLIi BMnmH MUBLLIl MJWU MMOMNLUKT CHMN 

HUMftMit!WMUBCiiiiw(MHAUWiBHf 0WMMR^^aPE^■ 
HUMANBM0O^^M^MlE^m)M^MJ^A^LJaM^a■^M^^18M)laBaL. 

HUMAN MYOMN ALKAU LJOHT CHAM (VBBRKUAR) MRNA. OOMAETE 
HUMAN IffMA FDR >CNrRCUUR MWBM LJOKT CHAM 1. 



HUMAN ICUTROPML CYTOCmOME ■ LMMT CHAM Paa PHAQOCYTE 
HUMAN NGUTROPIM. CYTOaa FACTOR 1 (NCFmIRQ PROTEK 



HUMAN QBS FOR NEURORLAMENT BUBLBBT 1^4. 
HUMWI OBC FOR NEURORLAMBIT BUMMtr H (NFMi 
HUMAN WnVB OROMrrH FACTOR BBTA(BErAMaP) QM. a 
HUMAN NERVE OROarTH FACTOR RKCPTORMRMkOOMnjETE COB. 



HUMM NEUnOlEUNN MM^ OOMPIETE COB. 

HUIMKmD(P)HMENmONEOWDORB)LJCTAMMWaV.COMPI£TECM. 

HUUM MiVC OEIC E)CNi t AND a 

HUMAN Q0M LME MMCQBC. 

HUIMN NUCLEQPHOMM MRMA, COAMTE COB. 

HUMAN NEUiaWH*Yf<PY)PRKUnaORMnNA 



HUMAN (a<a) ouao A avimcTAK B Q 

HUMAN ORMTHMEA 



ME.E)«M7AN0FUNKa 
IMRMLCOaMTECOB. 
EMRMLOOaaiLETECOB. 
HUMAN MAM ONOOBTATM M MVaV, E)CN a 
HUMAN ORMTfM OBCARBOKVLMB MRNk COMPLETE COB. 



HUM>1P 
HUMPM 
HUMPt9C17 
HUMP4B9Ct 



(OIC) 

(oro OBM, saoN to MB) J 

M I QMS. COMPLETE COB. 
HUMANi aa MFORtEUHDCYTEAOHEMONQ ftO P ROH MPiaoja 
HUMAN PI MATRRPROTEMMRNILOOmETECOB 
HUMMI HOUBElCDma PROTEM OM pa OOMPLETI COa 
HUMAN MaoatVlA-l (VTEROB 17^ALPHA«fVDRQMnAB&17J0 LYA8E) 
HUMAN GYTOOaUME P^aattOCL CHQUBTEROL DCHMLABE. EaON t 
HUMAN MRNA FOR PLflCMaTRM(P<7)> 
HUMAN CCLUULAR PHOaPHQPROnEM PU aM. laOH 11. 
HUMAN PaaCELiUARTtaiOR ANT»« feBMA. COMPLETE COB 
HUMAN PS CELUULARTUIOR ANTIODI MRNA, COMPLETE COB. 

HUMAN iMNA FOR moroN Faa 

HUMAN HELANOfeBUUBOCHTEO ANnaO«P«7 (MELANOriMNBFBVWO 
HUMAN PL Aa iaNO Qt NACTIV A T 0R MaMT O R-1(PAH)iaMA. 
HUMAN PLACEHrALRASaNOQOIACTnMTOR MMffO R MRNA. 
HUMAN PLASttNOQEN ACTIVATOR aaWTOR MRML, COMPLETE COB. 
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HUMPAOM 


HUMAN PLASMNOGENACrrVATOR tfMBrTCXV2(PAM) OCNE. EXON 


HUMPROF 


HUMPABB 


HUMU4 PlASIiNOQEN ACTIVATOft iMBfTOR 3 (PA^ URNA. 


HUMPROLA 


HUMPAfiR 


HUMN MPNA FOR ARCV6CRPIN (PlAflMMOQEN ACTIVATOR#tMTOR 


HUMPROLEU 


HUHPAIA 


HUMAH PLA8MM0QCN ACTIVATOR MiBTraA-l QBC EXONB 2T0 


HUMPROTI 


njMPjun 


HUMM IMM FOR ENOGTHEUAL PIA8MM0QEN ACTIVATOR MHMTOR 


HLM^ROTP 


HUMPALA 


HUMAN PREALMMM HRM^ OOMPl£TE COSv 


HUMPRP 


HUMPALft 


HUMAN PREALBiAiN MRNA, COMPLETE CCB. 


HUMPRPM 


HUMPALC 


HUMAN SERUM PREALBUMM QEIC 


HUMPRPA 


HUMPALD 


HUMAN PREAIBUMM GENE, OOMfUTE COS. 


HUMPRPS 


HUMPALR 


HUMAN MUTANT PREALBUMM QENE DtRECTLV UNNB) TO FAJMUAL 


HUMPflPC 


HUMPALFAP 


HUMN PREALBUMN MRNA M MOMOUALS WfTH F AMKiM. 


HUMPRPO 


HUMPAM 


HUMM PUCENTA ANnCQAOUANT PROTEM PP4 MWA, COMPIETE CO^ 


HUMPRPE 


HUtPAP4A 


HUMN PROTBN PP4>X hBMk COMPLETE COS. 


HUMPRPF 


HUMPASB 


HUMAN QIVOOPHORM C (PAft^ MRTK OOMPIETE COOl 


HUMPRFHI 


HUMPtOOR 


HMMN MM FOR PORPHCaiUNOQEN DEAMMMBE (PB(M>. EC 


HUMPRPH2 


HUMPBODRB 


HU^U<liWMFORNO»»€HyTHR0POgTICPOWPH0SI.IN0nEW0EAMt4ASE 


HUMPRPSB 


HUMPCAR 


HMMN MRm FOR CALCIUM DEPB«)ENT PROTTBhSE (SIMa SUSUMTX 


HUMPS3 




HUMM PROLFEIMTMa CEU NUCLfiM ANTKBI aaC COMPLETE COS. 


HUMPS2Q3 


HUMPDQA7 


HUMN PUTELET-OCRra) OROVVrH FACTOR AOMN QBC EXON 7. 


HUMP9QIA 


HUMPDOfM 


HUMMN PLATELET-OERMED OROWTH FACTOR (PDOFA) A CHAIN OEIC 


imnygA 


HUMPDQFAR 


HUMN MRNA FOR PlATELfT DER^ED QROWTTH FACTOR ACHAtN 


HUMP0Q3A 


HUMPDQFAM 


HUMW PLATELET-OCRWED QROWTH FACTOR A TYPE ICCEPTOR URNA. 


HUMPSPA 


HUMPDQFBA 


HUMM PLATRET-DCRi^ OROVmi FACTOR BETA (P00F4) IMNA, 


HUMPSPB 


HUiPOQFR 


HUMM PUTELET-OERIVED OROHTTH FACTOR (POOF) RECEPTOR URMV. 


HUMP8PBA 


HUWOQFRA 


HUMUI PLATELET'OERWED OROMTTH FACTOR (POOF) RECEPTOR MRNA. 


HUMPSn 


HUHPOHA 


iMii^rtrTrtr>'jiTrDOiTnnrrnrwinr i^nvifflmftwTii>tr.fTiiirirTr 


HUMPSTIA 


HUMP(>Mia 


HUMM PYRUVATE DBfTOROQENASE El AlPHA (PWCI -A) SUeUNTT QENE. 


HUMPST1A4 


HUHPOHB 


1 n PmUVflTf CtfHTnnOQFM^ftF Pfr^ yp***. fr*m^ fTf 


HUMPTAA 


HUMPEP* 


HMMN PEPSMOQEN GENE. EXON ti 


Huwme 


HUMPEPAM 


HUMAN PEPBMOQEN A 0 &0) QENE. EMON ft 


HUMPT>fl. 


HUMPEPCt 


HUMM PEPSBCXIBI C OCNE. EXON li 


HUMTHLI 


HUMPEPCM 


HUMN PEPSMOQEN OENE EXON ft 


HUMFTHUM 


HUMPEPO 


HUMAN PROINMSE (MIDOOiPEPTKMSE) MRNA. COMPLETE COS. 


HUMTHRP 


HUMPCR 


HUMAN P0VCRM MRM^ COMPLETE COS. 


HUMPUMPi 


iMiimuiMii 


HUMAN IffMA FOR QUITHATIONE PEROKKMaE. 




HUMPFKM23 


HUIMN MUBCLE PH0SPH0FRUCTOKBM8E (PFKM) MRNA, COMPLETE COS.. 


HUMPYWASA 


HUMPQAMU 


HUMAN PHOSI'HOQLVCERATE MUTASE MUSClE-SPGCmC SUBUWT MRNA. 




HUMPQAMiQ 


HUMN PHQ8PH0QLVCEMTE MLTTASC (PQAIMQ QEN^ COMPLETE COS. 


HUMQPC 


MJMPQK2 


HUMAN PHOSPHOOLVCERATE lOWTC (POK) MRNf^ EXONB S TO lAST. 




HUMPQK2Q 


HUMAN TEVTS^PECnC PQK4 ODC POR PNOSPHOQLVCERATE KINASE 






ii»iiM¥iiMfrnPHfviPHf>»*vPFnATFinMAnrf>gi* gwmii 

rV^HR^ ^TW^nKW r ww^mM 1 ^^ifcl ^^^^W 1 C il^^ni^C ^^bV^E* II. 


IftMRAUrC 




U| AAUi UDMA BQO pogrgw OEIC PRODUCT fPOPI UL 




HuypQfln 

HUMPQOA 


njMAM mnQfTTTrnnif nrfirrron mrna, complete cos. 

HUMAN MfMA FOR PIASMAQELBOUK 


HUMMR 


HUMPHH 


HUMAN PHENVUUANNC KVDROKVIASE MRNA. COMPLETE COS. 


HUMRASFA2 


HUMPOFSA 


HUMAN PflCPR(>#ttULIN4J(E OlOMfTH FACTOR H (PREPRCMQF-I) 


HUMRASFAB 


HUMPMM 


HUMAN PM-1 ONOOQENE MRMV, COMPLETE COS. 


HUMRASH 


HUyPMIA 


HUMAN PM-I PROTOONOOQEfCOENt COMPLETE COS. 


HUMRASKSS 


HUMPIP 


HUMAN PR0LACTIN4CUCIB1E PROTEM (PP) MRM^ COMPLETE COS. 


HUMRASMM 


HUMPKA 


HUMAN MRM FQRCAMP4)EPBOENT PROTEM KMA8ECATM.YT1C 


HUMn^M 


HUMPKB 


HUMAN BETA TYPE PROTESI NNA8E C MRM^ OOMPIETE C0& 


HUMRASR2 


HUMPKCB1A 


HUMAN MRNA FOR PROTON KWMBE C (PKC) TYPE BETA L 


HUMRA3RP 




HUMAN MRNA FOR PROTEM NTMBE C (PNC) TYPE BETA IL 


HUlfflASRPB 


HUMPKLA 


HUMAN PYRUVATE MMME TYPE L MRNA, COMPLETE COS. 


HUMRBP 


HUMPNHBL 


HUIMN Ma-TYPE PYRUVATE tMMSE MRNA, COMPLETE COS. 


HUMRBPC 


HUMPIA 


HUMAN PLACENTAL LACTCQEN HORMONE: HPl/3 GENE AND FLAMCa. 


HUMRBPt 


HUiiPU2M 


HUMAN PANCRBkTC PHOSPHOUPASE A4 (PLA«) OENE. EXON 4 


HUMRBS 


HUmASRA 


HUMAN LUNQ PHOSPHOir ABE ^4 (PLA^ MRMk OOMPIETE C0& 


HUMRBSA 


HUMPIANO 


HUMAN PLAKOQUiem kSVM, COMPLETE COS. 


HUMRBSY7B 


HUMPU^ 


HUMW PLACENTAL ALNALJNE PHOSPmTAS&UKE GENE. COMPLETE 


HUMRCCIA 


HUMPlAffT 


HUMAN L-PLASTIN POLYPEPTDE MRM^ COMPLETE COS, CtOfC Pi 


HUMRCC1B 


HUMPLA8TA 


HUMAN ^PlASmN POLYPEPTDE MRNA, COMPLETE COS. CUXC P107. 


HUMRCYP9 


HUMPUT 


HUMAN TBSUE-TYPE PLASMNOQEN ACTIVATOR (T-PA) MR»M. 


HUHREOA 


HUMPIAX 


HUMAN PUU( MRNA. 


HUMRQMI 


HUMPIA 


HUMN PLACENTAL LACTOQBI HORMOIC MV« MRM. 


HUMREIMS 


HUMPUUt 


HUMAN PLATEtETMCMOnAICQLYtOPWHJNMMRMIk COMPLETE COS. 


HUMRETPON 


HUMPtPI 


HUMUl iraJN PROTEOUPO PROTEM QBC EXON 7. 


HUMRETREC 


HUHPlPA 


HUMAN PROTEOUPO) PncrrSN MRNA (PIP]L COMPLETE COS. 


HUMRFPA 


HUMPIPOM 


HUIMN CNB MITEUN MAJOR PROTEOUPO OOMROICNT (0I»«) MR»«k 


HUHRHOe 


HUMPIP8PC 


HUMAN PUiOMRT BURFACTMfT PROTEM C QETC. COMPLETE COS. 


HUMRH066 


HUMPMtUeA 


HUMAN PLATEIET MEMBRANE OLYOOPROTQN MA BET A SUSUMT MRNA, 


MJMRHOO0 


HUMPMQR 


HUIMN IffiNA FOR PLASMMOGEN. 


HUMRSIR 


HUMPMPCA 


HUMN PLASftM MEMBRA»C CA2« PUmiQ ATRASE MRMk COMPLETE 


HUMRBOAO 


mrwiT 


HUMAN PHENVLETNANOUUyBC NMETHnJIMNBFERASE MRM^ COMPLETE 


HUMRNAPn 


HUMPNMTA 


HUMAN PICimETHANOLAMDC IMIET>mJIW4BFEIM8E OENE, COIffUTE 


HUMRNP70A 


HUIi>NU 


HUMAN PURME NUCLEOStOE PH08PH0RVLASE (Pf^ MR»M, COMPLETE 


HUMRNPTOK 


HUMfAM 


HUMAN PURME NUCIEOSOE PHOSPHORVIASE QENE, EXON ft 


HUM1NPA1 


HUMPOLt 


HUMAN POLYMERASE BETA MRNA, COMf>L£TE COS. 


HUMROOQ 


HUMTOtDWM 


IUBWI ONA POLYMERASE AtPHMUBUMT MRM. COMPLETE COS. 


HUMROOSA 


HUMPOLP 


HUMAN POLY(AOP4VB0SE) POLYMSMSE MRNA, COMPLETE COS. 


HUMRPIt 




UUMAU MQIilA fm PCLVA AM]tHa pQCffPH 
^Amff r\*iw nnuHwa i liwi 


HUMRPHOSA 


HUMPQMC 


■■■iiM***w*'v»fi>Mnrf>fTnM(rfftirifirw' fOMTinTrfin 


HUMRPUS 




uiattMPOnrtCif^giAMnMPfiMiPifliiCtflgMP ewfiM* 

niMMIf raMJVMMCIMWAA^ 1 f P^MMw/ VCnE, 4k 


HUMRP811 






HUMRP9I4 


HUMPOVM 




HUMRPS17 






HUMRPSITA 






HUMRP9CA 


HUMPP14A 


HUMAN nACcNlALmwitm !• ( WH*) ■■•1^ WWIrttIt CM. 


HUMRPZHZI 


HlMffPIB 


u ikAAkK^me eMSBAia/fM A/*ckfrAJ !■ w il 1 ■! lai 
rMWNUKI«rOHt^B|fV<MMIALmuiul ^9}, 


HUMSAA 


HUMPPM 


HUMAN PHOIbiH PHOSPHATASE aA MRNA, OOMKETE COS. 


HUMOAACT 


HUMPPAnPO 


>UaMiACIOCnBOO0MALPH08PHU>WmNP0IWNA. COMPLETE COS. 


HUMSAAM 


HUMrpAfW*1 


HUMAN /ClOICRttOSOMALPMOSPMOPROTEMPHWNA. COMPLETE COS- 


MMSAl 


WHPPAm 


HUMAN ACIOICRISOSOMALPHOaPHOPROTEH pa MRM^ COMPLETE COB. 


HUMSAP1 


HUM>PE 


HUMAN PANCREATC PROTEASE E MRN^ OOMRETE COS. 


HUM9APA 


HUHPPCPt 


HUMAN SWW PCLtJvTDE B MmA, w^wxriK um 


HUHSAPR 


HUMPPKKA 


NUOEOnOE SEOUBCE OF TIC CONA BSERT OF UI0DA PN1» 


HUMsaasR 


HUMPPP 


HUIMN PANCREATIC POLYPSiTlOE (PP) AND lOOSAPEPTDE PRECURSOR 


HUMSS3B4 


HUMPPPA 


HUMM PANCREATIC POLYPSmOE QBC. COMPLETE CDS. 


HUMSB4a4 




HUIMN PROTBrTIVE PRCrTEm MRML COMPLETE COS. 


HUMSCAO 


HUM>Pft 


HUMW VTTMIM mPENDEHT AASMA PROTBH 8 MRNA, COMPLETE 


HUMSCAR 


HUMPPV 


HUMN PANCREATIC POLYPEPTSE Y MRNA, COMPLETE COS. 


HUMSECPl 


HUMPfVUI* 


HUIMN PREA4 QENE FOR AUHEMER9 OeCASE A4 AMVLOlO PRCrrSN 


HUHSECPA 




HUhMN PROTEM C GEre EXON • OF ft 


HUMSECT 


humpSca 


HUMN PROTBH C QEIC COMPLETE COS. 






HUMN PROrBNC MRNA. COMPIETC COS. 


HUHSERDHY 


wMpm' 


HUMN PRS GENE (ALT. HPQ) ENCOOINQ HEMOPOEinC PROTEOGLYCAN 


HUMSGN 


HUMPR. 


HUIMN PREPROLACTW (PRL) MRNA. 


HUM9QLT1 


HUMPna 


HUMM PROUCTW QETC EXON & 


HUMSMLOB 


HUMPnOO* 


HUMAN PROLYL 44m}RO0(nAaE BETA.SUBUMT AW OlSULrae 


HUMSeM 



HUMAN PROFIUN MRM^ COMPLETE COS. 
HUMkNCAnCPSM LQENE. COMPLETE COS. 

HUMAN SECRETORY QMNUIE PROTEOQLVCAN PEPTDE CORE MRNA. 
HUlMNTESnSSPECrC MRNA FOR PROTAMmE t (PI). 
HUMN ENDOMEMBRANE PROTON PUMP SUBUMT HRMi, COMPLETE C08L 
HUIMN PRION PROTEM (PRP) MRNA, COMPETE COS. 
HUIMN PRKM PROTEM 27^ MRNA. OQMPLETE C0& 
HUMAN PRH2 UOCUS SAUVARY PROUNERKH PROTEM MRMk (PRt 
HUMN PRH1 UOCUS 8AUVMTY PROUNE-RBH PROTEB* MRNA (Pr 
HUMAN PRB1 UOCUB SAUVM1Y PROLJNE-RCH PROTEM MRM. CUONE 
HUMAN PRB1 UOCUS SAUVWIY PROLME-RRH mOTEM MRMk, CUONE 
HUWM PRil UXUB SAUVARV PR01JC4BCH PROTEM MRm, CLONE 
HUMAN PRB4 UOCUB 8AUVARV PROUN&RCH PROTEM MRM. COMPLETE 
HUMAN PRHI GENE (M»TYPE SUBFMSLY; ALIEIE PfM ^) 
HUMAN PRH2 QEfC (HAEK-TYPE 8USFAMN.Y: ALLELE PFM-1 ) 
HUMAN MRNA FOR PH0SPH0RK3S0SYL PYROPHOSmATE SYNT>CTASE 
HUIMN P81 MfMA, OOMUTE COa FROM BREMT BREMT CANCER CEU 
HUMM eBTnOQENRESPONB^ OeC P8B EXON ft 
HUMAN PREQIMNCY-SPGCmC BETA-I^GLYOOPROTEMFETAL UVER 
HUMAN PREONANCY-SPECmC BETA-I-GLYOOPROTEMFETAL UVER 
HUMAN PREGNANCY-MCIFtC BETA-I^GLVOOPROTENmAL UVER 
HUMN PUMONARY BURFACTANT-ASSOCIATB) PROTEM MVM. COMPLETE 
HUIMN PUUIOIMRY SURFACTANT-AflSOCIATB) PRCFTEM MRMV, COMPLETE 
HUMAN PUiOMRV SURFACTANT ASSOCWTEO PROTEIN PSP4 MRM. 
HUMN PANCREATC BGCRETOHV TRVPSM BMBOOR (PSTD MfMA. 
HOMO SAPCNB PSn MRm FOR PANCREATIC SECRETORY IWMITOR 
HUIMN PANCREATIC BGCRETORV TRVPBM BSMTTOR (PSTT) QBC 
HUIMN PROTHYMOSINAlPm MRNA (PROT^ALf^ COMPLETE COS. 
HUMAN PARATWROIO (PTH OEIC OOOStt REQKM AND VFLAMC 
HUIMN. RARATHVROttUJICPROrEBItASaOCMTEDWTTN HUMORAL 
HUMN RARATHTROID HORMON&mC PROTEM (PU>) OEIC EXON ft 
HUMN RENAL CARCMOMA PAfMTHtVMO HORMON&LKE PEimOE MRNA. 
HUMAN PARATHYROlO HORMONE-REIATEO PROTEIN MRMV, COMPLETE 
HUMAN PUMP-1 MRNA HOMOUOOTO METAlLOPRgTEINAaE. COOAOEWSE 
HUWN PVRWATE DEHV0R0QEIM3E ALPHA SUBUNTT MRM. COMPLETE 
HUMN PROLYL 44m)RO0(YLME ALPHA SUBUWT IMC COMPLETE COS, 
HUIMN PROLYL 44fVDR09CrLME ALPIM SUBUNT IMC COMPLETE COS, 
HUMM MTTOCHQMIRM. UBKaUSOC-BBCXNO PROTEM IliVC COMPLETE 
HUMAN RABS IMC YPT14CLATED AM) feCHBBI OF RAB FMMLY. 
HUMAN Mf»M FOR FMF ONCOQEie. 

HUMAN CaUJLAA RETWALDEHTOE-BSStlQ PROTEM MRMV, COMPLETE 

HUMAN fMP2 MRNA FOR RA»flaATEO PROTEML 

HUMAN MRNA FOR RETMOC AOO RECEPTOR 

HUIMN RETINOICACtO ReCEPTOaCPStXM MHMV, COMPLETE COS. 

HUIMN RASFA PU2 QBC ENOOOtia SYNOVIAL PHOSPHOUPAflE. 

HUMAN RASFA PLAS MRNA, CdVLETE COS. 

HUMAN &HMW81 PROTOONOOQENE. COMPLETE OOOINQ SeOUBCE. 

HUMAN CELLULAR PROTOONCOQCIC &I04MS2, ENON 4A AND FLAMC& 

HUMMN NRA3 PROTOONCOQENE. EXON 4 

HUMM CELLULAR NRAS PROTOONCOQEIC. EXON 4 

HUMAN RA^ GENE, EXONB 3 THROUGH ft 

HUMAN RA94CUTE0 PROTEM RAP1A MRNA, COMPLETE COS. 

HUMAN RA&mATEO PRCTEN RAPIB MRNA, COMPLETE COS. 

HUMN RETMOL amOSn PRCrTBN (RSP) MRML COMPLETE C0& 

HUMAN CELUULAR RETMOL^mOtM PRGTEM IMC COMPLETE COS. 

HUMAN WTERSnTlAL RETBOL BBONG PROTEM ORSP) IMC 

HUMAN RETWOSLASTOMA BUBCEPTIStiTY IMC COMPLETE COS. 

HUWN RETMOBLASTOIM BUBCEPTWLITV PROTEM MRNA, COIMJETE 

HUMM MUTATED RETWOSLASTOMA BUBCEPnSftJTY (RB) IMC 

HUMAN MPNA FOR CaXCVCtE GOC RDC1. 

HUMAN MRNA FOR COLCVCLE OEIC R0C1 . 

HUMAN MRm FOR CYTOCHROME ^4S0 (CVP3 UOCUBV 

HUMN BLET OF LANQERWNS RCQENEIMTMa PROTEM (REG) MRNA. 

HUMM PREPRORBAXM H2 IMC COMPLETE COS. 

HUMAN RENN GENE, EXON 1ft 

HUMAN RET PROTOONCOQOCIMM FOR TYROBB C WNASE. 

HUMAN MRNA FOR RECEPTOR OF RETBiOC ACa 

HUIMN RFP TRANSFORMMQ PROTEM IMC COMPLETE COS. 

HUWN RHO MRNA (CLONE 12^ 

HUMAN RHOe MRNA (CLONE IX 

HUMM RHOC MRNA (CUONE n 

^AAAAM MRNA POW RfiOPMOrtN L 

HUMAN P0LY(ADP4«B0SE) SYNTHETASE MRtC COMPLETE COS. 
HUMAN RNA POLYMERASE I aSKD 8UBUKT MRNA, COMPLETE C0& 
HUWNUt SIMIL NUCLEAR RIBOMUCUOP H OThlWTO KB PROTBN 
HUMM MPNA FOR Ul RNAAS80CMTE0 7«K PROTEM. 
HUMAN MMIP CORE PROrTBN At. 

HUMAN UROO GENE FOR LfftOPORFHYnNOGEN OECARBOKVLASE 

HUMAN UROPORPHIMNOQEN M SYNTHASE MRNA, COMPLETE COS. 

HUMM MRNA FOR 1 MO PRCFTEM OF SIGNAL RBOOOMnON PARTICLE 

HUIMN MRNA FOR PROTBNPHOSPWTABE C 

HUIMN M»M FOR RnOSOMAL PROTEM Ltt 

HUMAN MRNA FOR RffiOSOMAL PROT6N S1 1 . 

HUMAN RWOOOMAL PROTEIN SI 4 GDC COMfUTE COS. 

HUIMN RU080MAL PROTEM S17 MRNA, COMPLETE COS. 

HUMAN R»OSOMLL PROTBN S17 OBC COMPLETE COS. 

HUIMN R0OSOIML PROTEM SB MRNA, COMPLETE COS. 

HUIMN RB080MU. PROTEM MnC COMPETE COS. 

HUMM SERUM AMYLOD A GEN^ COMPLETE COS. 

HUMAN SKELETAL ALPHAACTM GE*C COMPLETE COS. 

HUMAN SERUM AMYUOR) A (SAA) IMC 

HUIMN CYST ATM SM IMC COMPLETE COS. 

HUIMN aPHNQOLlPD ACTIVATOR PROTEM I IMC t BO. 

HUIMN SERUM AMYLOO P COMPONENT (SAP> IMC OOIMETE COS. 

HUMAN MRNA FOR 88UH AHYUMD P OOMPONBfT (9APX 

HUMM MRNA FOR HLA CLASS ■ ANTIGEN SB SBETA CHAM 

HUIMN GE»C FOR MA CLASS I SM BETA CHAM (EXON 44V 

HUMAN QBC FOR MA CLASS I Sft4 BETA OMM (EXON 44^ 

HUMAN SHORT CHAM MYLCQA OEHTOROQBMSE MRNA, COMPLETE CDS. 

HMMN SCAR PROTON IMC COMPLETE COS. 

HUMU4 JE OENE ENCOOMG A MONOCYTE SECRETORY PROTEM GXON ft 
HUhMNJEQENE ENCOOBia A MONOCYTE 8CCRET0RY PROTTEM COMPLETE 
HUMAN PROTGOLVnC 8ERME ESTERAS&LMI PROTBN (SECT) QBC 
HUMAN SEMBIOOBJN PROTBN MRNA, COMPLETE COS. 
HUMU«8£RBC0emiRATAS6MFMA, COMPLETE COS. 
HUMAN 8ECRET0QRANM N GENE. COMPLETE COS. 
HUIMN NAWQLU006E COTTMNBPORTER 1 M»C COMPLETE COB. 
HUMAN MRML FOR ERYTHROCYTE MBMRMC BMLOQLYOOPROTEM BETA. 
HUIMN C^ PROTOONCOQENE PORPLATELET^RNB) GROWTH 
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HUM8UPQ 
HUMBUm 



NUMBP136 
HUWB 



HUMTtt* 



HUMTM 

HUMICS 

HUMTCAM 

HUMTCMS 

HUMTCAXi 

HUMTCAVC 

HU MTCT YY 
HUMfCSVZ 



HUMTTOOH 



HUMravM 

HUMTCmC 
HUMICmiO 



NUMTCtM 

HUMTCXMA 

HUMfrVTA 

HUMTF 

HUMTFP 

HUMTPPt 
HU MTFPC 

HUMTnn 



HUMTOFAM 
HUtfrOF* 



HUMTHD 

HUtfTHM 

HUMMM 

HUMTWI 

HUMMWtA 

HUMnVMM 

HUMTHR 
HUMTHTIA 
HUMTHVtie 
HUMrnVM 



HUMTHyMU 
HUMTHVP 

HUMmyt 



Huwm 

HUMTK 



HUMAN C«B MMA EICOaNa PUTEL£T-«CIWEO QIOimtFACTOn a 
HUMN C4»PUTQ£r-0CPNEO onOWVTH FACTOR 8 (SttPOOFI) 
HUIM4 8U1 GQC FCR tecrCTORr t£UNOCm PnOTEAK MWrOR. 
NUMAM fllfl IMM rmQie«r FO« flCCICTQm IMOCYTE Pf»^^ 
HUIMN 8nC4M (BUQ IffMk OCMPlCn con 

HUMM< 8 »LOPH0 n W| CO U| lff H^COiil£TEC0a 



HUMTm 
HUMTNT* 
HUMTNraA 
HMfTOPI 

HUMTPI 



HUIiWlJUPU8AWCA>mqEMpMAaiftJCUEARI»0NUtmU1>UIW 
HUM W9 M4aMUCl£WRaONUCIfiO<WrBMfa< nP lEQBg.PO*<5 
HUMMI AUrCANnOOl SMf^ NUOM RMNuonpfwreN s»w 
HUMAN CUOM SUPenaOE MMUTAK (800) IMt^OOMPI^ C0& 
HUMM SUPEROOaOC OttMUTAflE (tOO-l ) MRNA. OCUREIE en. 
HUMAN MFWA R» MANOMOE-COmMMNa aUPBOOOt DOMUrASE. 
HUMAN EXTRACEUJUIAMUPeiONOC Om/TAiC (MOO} IMM. 
HUMAN SUPCfOSaOE DMMUTASE (M0>1 1 OCNE. DCN » A»e FIAMA 
HUIMN aOMATOSTATM I QBC AND RAMS. 
HUMAN SERME PROTEMMCE mOTBN MRM^ OOMPICTE COS. 
HUMAN PUUOMIIV SURFACTAMT PWm (8PS1 MWA, OOMf)l£TE C0& 
HUMMI aMDOVTGOICCTM MRMk OOMPIETE COtw 
HUMAN OVTEOMCTM QCNC. BXON la 

HUMAN iM QBS ENOOOMQ PUMONARV SURFACTANT PROTEK 
HUMAN PUMONAirr SURFACTANT mcnrCOU>n(IFMFVMi) MRNA. 
HUMAN FRQSTATE SORTED BSNAL FUMMA PROTON MRM^ OOMPIETE 
HUMAN SMAa PROIME RBH PROTEM (8PRQ MRN^ CLOIC IMl 
HUMAN SMAa PROUC REH FR0rOI(8PRi) MRNA, CtONE ISft 
HUMAN SMAaPROUNE RBH PROrOI (M) MRNA. CUME m 
HUMAN SMML PROUM RKN PROVBN(SPRiO CtjOIC MO. 
HUMAN SMAU. PRCUNE RKH PROTEMCSmO MMk CLONE 1T4N. 
WMANIWHAmiPHLHUNSL 



HUMTPSIS 
HUMTPA 
HUMrPAf4 
HUlfTPAft 



HUMAN C4RC-1 PROnOONCOQOC ElKM 
HUMAN Sen UM fO POW W FACTOR (SWF) MRM(^ COMPUTE CDSl 
HUMAN STSROD RBCVrORTm MRMk OOMPIETE COS. 
HUMAN STAIWVN MRM^ OOMRCTE COS. 



HUMAN STEROO SUFATME (ST^ MRNfk OOMPtlTE COSl 

HUMAN STGROD SUVATME ^NCROSOMA^ OOMPtETECOS. 

HUMAN STATHDW MMA, OOMFISTE COS. 

HUMAN (r-«)OUQQAOENnATE SYNTHETASE (»« SVNTtCTASE)^ 

HUMAN MRNA (AIMS) FOR OEM ttS FROM nsCf10NM.T-CEU. 

W MMHSC n p r mU ^ AeS O CI A T CDPna T W TAUIffN^COMPtETECOS. 

HUMAN SETA-TIMUUNOM (SSETA) WTTH TEN HJJ FAMKY 

HUMAN SCTA-TIMM OEIC CLONE MMi 



HI,iMN I NW A FOR FAST SttLETALT nOPO MN C 
HUMANT^aiAGTNATnN QENE »(rCAI| MRM^ OOmETl COSl 
HUMANT-CEaREC9rORACTMALPHA«HAtl MRNA FROM JHCQi. 
HUMANT-CEURKVrOR ACTM ALPHACHASI MRNA, OOMUTE COS, 
HUMAN T<Ctll. RBCSFTOR RBWRANOED AlPHACHAil VflEOKM 
HUMANT'CeXnCEFTOR ACTMB StTA«HMH MRNA FROM Cai UW 
HUMAN T'Cai. MK0TOR ACTWE KTACNASI MRNA, OOMaSTE COSL 
HUMANT-CEU. RBVrOR nCARMNOED BCTACHMN V^eOION (V-DJ) 
HUMAN QENi FOR SOKTSOUOOPROTCN (TMELTAOWN) OF T-Cai 
HUMANTRAIMOSMAMSI I MRM^ OOMPIETE COS. 
HUMANT Cn iSF t» RC MHMA (YTW) FOR FTKHOMOtOQOUB PROTON 
HUMANT-C&L RECEPTOR COS^MSM OBC EIKMIl 
HUMWI MRNA FORTCROEITACHASL 

HUMAN T-OEa RKEPTOR MMA CHMN VUOCSCN flUKM IISM. 

HUMANT-Cai RECEPTOR DELTA OMM MRW {MC4BQ»N]i 

HUMAN M R NA FORTXEaRBCEPTOR ft ANMA CI MS t , 

HUMAN MRNA FORT^CELLRCCB^TIQAMM POLYPEPTDE 

HUMWI T'CEU. ANTUDI RECVrORQBC TMaTA. 

HUMAN MRNA FORTS 9SftONCNASI(anOOFT'CELI RECEPTOR 

HUMANT Cai/SPGCmC PROrTEMfRANrCS) MRN^ COMPLETE COa 

HUMANT-CEU. DffFWBffUCraN ANHQEN LEU«r* MRNA, PMTTIAL 

HUMAN m SilALT W AIMF EnASE M R NA, OOM«TI COS. 



»tJMAN PIACCNTALTaSUC MCTQR (TWO FORMS) mm, COMPLETE 
HUMAN TMUE FACTOR MRMk OOMAITE COS, WTTH AN AUI REPEAT 
HUMAN TMSUE FACTOR OEM, COMPLETE COOL 
HUMAN TMUE FACTOR QENE. OOMRETE COS, WTTH A AUI 
HUMMITRAMFERRSI REO^rOfl MRNA. OOMPL£TE COS. 



HUMAN (Cai LM ME7 FS;) TRANSFORMSM ORCMTTH FACTORAU>m 
HUMAN TRAMFORHMa QROWTHfACTOMCrACraFWA) MRMk 
HUMAN TRANSFORMMQ OROMmt FACrOIWETA« MRNA. OOMPiCTE COS. 
HUMANTRAMFORMSa OROWTH FACTOMETA SCTOFWAI) MRM, 
HUMAN HERATC TRMLVCERBIURASK IMS^ OOMETE COS. 
HUMAN PROVmOMBMOeS. OOMFUTS COS, AND ALU AM» KPM 
HUMAN THVROO HORMONE BSONO PRCTESI (PK) MRMV, COMPLETE 



HUMAN ENDarHBMLC&I.T»SK]MS0M0OUUNMRM. COMPLETE C0& 

HUMAN THVMBOMOOUUN OENE. OOmCTECOS. 

HUMAN MRNA FOR TYROS0C HniROXnASE TYPE A 

HUMAN THTROS HORMONE RCCEPTOR ALMA 1 (nWMLPHA.|) OCNE, 

HUMAN MONEY &MWI1PHA40M DttOONiQTHmOO HORMOIC 

HUMAN THYROnAUTOMim 

HUMANTVROSSC HYOROKYIASE TYPE 4 MRI^ COMPETE COS. 
HUMAN THY-f OLVOOFROTEil QEIC, O0M>LBTE COS. 
HUMANTHYMOSSI BBTA-fO MRNA. OOMFICTE COS. 



HUMAN PRCmfYMOSSI ALPHA MRNA. COMPLETE COS. 
HUMAN PROTHYMOSM MPm HRMk OOMaCTC COBL 
HUMAN RARATNVMOSSI MRMk OOmETE COS^ 
HUMAN THYMBfYUTE SYNTHASE MRML 

HUMAN M O RA FDR TMSU E ISM ff OR OF METAilJOPRCroNASES(TIMP)> 
HUMAN aMNAFORLVMFNOCVrEOLYOOPRCrrESITlAEl^^l. 




(TO^ MRMk OOMnETE COS. 
NNASE RCC8)TOR(EP»0MRm. OOMAETE COS. 
E NNME OOS, CCHPLKreCOn, WTTH OUSTEnEO 
T>««MS0M0OULM QENE, OOMPIETECOS AM) FMNKS. 

n i no susTTRO>PMv osw Tioo{PgL 

MaETALTROPOMNC (TNC^ 

' FACTOR QEIC COMnfTE COS. 
FACTOR (TW) MRNA. 

FACTOR AND LYMPHCTOOON QENES. COMPLETE 



HUMTPI 

HUMTPMY08 

HUMTPO 

HUMTPQA 

HUMTPOS 

HUMTRA 

HUMTRQCM 

HUMTRKZH 

HUMTRKPOA 

HUMTRKR 

HUMTRL 

HUMTRO 

HUMTROPA 



HUMTROPSR 
HUMTSn 



HUMT8HSS 

HUMT8HM2 

HUMTUSAQ 



HUWrUBSM 
HUMTYRA 



HUMUDPGT 



HUMUKPPE 



HUlWAC 
HUMR 



HUMflUR 

HUWM 

HUIMPI 



HUlMTlit 
HUMONSP 



HUHCVPAA 
HUMYB1A 



MACAPORI 
MACEPO 



IMCPQA 
MACPOMC 

^§^CBQM 

MACTPtr 



ORAOLTQ 



OfMHSO 
0RA»MQ1F 



ORANMOL 



HUMVf LYMPHOrrOOON (r»FW A) aC»C COMPLETE COS. 

HUIMN SUMY SKELETAL MUSCLE TROPOMMT MRNA. CtOIC HZ3K 

HUMAN SLOW MCLETM. MUSCLE TTCRONN T MRNA, CLONE ttt. 

HUMAN TQPOMXMmSE I MRNA, OOVLETE COS. 

HUMAN ONATOPOMMBWSC ■ (TOPa) MRNA, OOMPLETB COB. 

HUMAN MRNA FORTRMMTKM PROTEM 1 (TP1 X 

HUMUH FBI CELiUARTUMOR ANTOEN IflNA. COMPLETE COa 

HUMN FBI CELLULAR TUSOR ANnOEN IMSL. COMPLETE C0& 

HUMAN TMSUE FLASMNOQEN ACTIVATOR (T4>A)QaC. COMPLETE 

HUMAN TISSUE-TYPE FLASMNOaCN ACTTVATOR (T-PA)OE»C EXON 14. 

HUMAN TMSUE-TVPE PLAMSttOOl ACTUATOR (T-PA) HRNA lACKMO 

HUWN FCTM-LiMQ nAHMOOEN ACnVATOR (T-M) ICSM (EC 

HUMAN TRKISEPHOSFHATS nOICRASE MRNA. COMPLETE COS. 

HUMAN IffSIA FORWELETAL BETA^TROFOMVOSSl 

HUMAN ISMA FOR TWROFERORnAflE. 

HUMAN THYROID PERORKkASE MRML CLONE PHTPM A 

HUMAN THTROE) FEROHDASE MRMk CLOM PHTP04.4 

HUMAN TRA1 STRESS PROTON MRNA, COMRETE COS. 

►MMNISWA PORT«ILHEABRMj^ QEHEfTRQ) 

HUMAN TRK PROTOONCOOeC SMCRT OF PLML 
HUMAN MfSM OF TTK ONOCQDC 

HUMAN MTTOCHONORML ADRIADT TRANBUOCATOR MRN^ COMPLETE COS. 
HUMAN TROPOMVOSM MRNA. COMPLETE COS. 

HUMAN FMROSLASr MUBCtfrTYPE TROPOMYOSM MRM. COMPLETE COS. 
HUMAN SICLETAL MUSCLE ALFHA-TROPOMYOOSI (HTMALPHA) IffWA. 
HUMAN U n MRNA FOR CYTOSKaiTAL TROPOMVOSSI TM3G(NM). 
HUMAN SWlETAt MUSCLE 1JKSSRNA FO R TWOPOMYOSK 
HUWkNTSII OeCBCOOMaAO-l PROQRCBSIONFRaTEH 
HUMAN THVROTROFSI (THYROn STMUUTSa HORMONE) SETA SUSUMT 
HUMMITHYROfROPM BETA(TSH«TA) SUKMT QBC OONB t AND 
HUMANTMYROmPSi SETASURUMTOBC, OCm tMD t 
HUMAN MPHA-TUSULMOEIC^AIPMA-llk COMPLETE COS. 
HUMAN ALFHA-TUSLAJN (FROM lORATWOCYTE CBiS) MRNA. 
HUMAN jETA>TU»ULj NOP«. COMPLETE COS. 

HUMAN MRNA FOR Ul 
HUMAN ISSSL FOR Ul 
HUMAN MRNA FOR US! 



A* moral 

OOmETECOS. 




I MPVS^ COMPLETE COS. 
HUMAN FRMJROMSMSi MRM^ COMPLETE COS. 



(LAPOT) 



HUMAN UROMOOULJNfTAMMMORBFAaaLYOOPROrTESQMRm. COMPLETE 

HUMAWUR0M0OUUN(TAMMIWWHIAq L Y Ui PR U I ti S MRIM, COMPLETE 

HUMAN UMP SYNTMMB NPMk COMPLETE COS. 

HUMAN PLASSMKOBI ACTIVATOR MRN^ OONPLETE COS. 

HUMAN UFA OBg FOR U n OWNAS E WABMPIOq EN ACTTVATOR 

HUMAN U n OPOHFIf WM C OEN OB C AWSCI (Y LASEMWm.OOMPLErE COS. 

HUMAN MRNA FOR VASCUAR AMrVOMUANT. 

HUMAN VITAMN D RKVTOR MRN^ COMaETI COS. 

HUMAN CYTOMLiil I (VU) MRNA, COMPLETE COS. 

HUMAN MRNA FOR WIK 



HUMAN VASORCm SirCSTBSIL POLYPEPnOE (VP) OENE. EICON S 
HUMAN VASQACTWi SffttTSML FPTWE AND » T K)S g M E TWOHWE 
HUMAN V^*g<CTWMWH^^ FOtYP tP I Wt W 0P«, EXON A 

HUMAN VMOPRBSSMCUROPHVSM I MRNA, COMPLETE COsf^ 

HUMMI CaX AOmON PROTBN (VmOOCTSS WCEPTOR ALPm 

HUMAN VfTRONECTBI (ftPRCTBIO MRNA, OOmETE COS. 

HUMAN VON WUEMAND FACTOR MRMk COMPLETE COSi 

HUMAN MFNA OF ACQO CM SMQUD BICMDMC OMNULOMATOUB 

HUMAN EPONOK HYDROLAae MRNI^OOMFUTE COS. 

HUMAN ZETA (OttOUaSI MWM, COMPLETE COB. 

HUMAN FETAL LWCT CYlOCfflOMEP 4»(IMB0HRAL COMPLETE 

YBORMNOSOPROIGM-I <YS-1)MRNt 

HUMAN USnumN OEIC (» REFEATR. 

t£MUR(SROMH) KTA4L08SI QM. COMPLETE COS. 

L£MUR{BRaiMnEPStOMaOSSIO0« COMPLETE COS. 

LBIUR(BR0MN)CMMMMU3SSICM,COtPLETICOB. 

LEMUR (OMMP) OMMMAOOSSI QOC OOmETE COS. 

LEMUR (RSM.TMLED) SSraUJORM QBC COMPLETE COS. 

M ^ ASCCULA IW (CYNOMOLflUS) MOWMY /POLPCPRPTEW A-l MRNA, 

MOIBCY (CYNOMOtaUS) DIYTMROPOKTW MRNA. OOmETE COS. 



RICSUS MOMCY KTACUUSTBt r FYTALflMMMAOOMNaaC 
jWB W MOtSg QLOBBIOBK, 



NA(PaA)MmA, COMPLETE COS. 
A PRMPtOMBANOOORTM (POMC) MRMk COMPLETE 
MPASOCULARM SOMATOSmTSII MNA. COMPLETE COSl 
MACACA MUUTTATRnSGFHOSPHATE MOMEfMSEOBC. EXON 7. 
8PVER MOMIEY VLOEOFFROn DfiJA«LOSM OW COMPLETE COS. 
MOMCYMETAL10nSOieNI(MT1)IMSL 
MOMCY MTALLOTMONESI ■ (MT^ MRNA. 

O R ANO LffAN ALPHA OLflSW LSg T>«TA(1>OtOSSI QENE DOMNBTREAM 
ORANQUTAN ADULT ALPHAMLOSStOeC 
0RANaUTANA0ULTAIPHA.14LOSMaea. 

ORANOLffAM(PJ»YqMA EU B) tF SS n WQLflSBiqe«. COMPLETE COS AND 
ORANOUTAN 0G7A OLOSSI QENE.OOMPLSTt COSl 
ORANOUTAN aAMMA.1#VrALaL0SB«QDC. COMPLETE COS. 
0RANaUTAN riM IRS HrE TALqLDSStQPg.OOAB»LETgC0S. 
ORANMITAN SMOtUCmi QE»C COMPLETE COS. 



TJYTSCHTA BETA OLOSM OENE, COMPLETE COS. 
TjSYRKHTA DELT AOjOSSI OENE, COM>LETE COS. 
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FOR TMUE 2 



HUM 


HUMAN 


HAM 


HAMSTER 


MUS 


MOUSE 


RAT 


RAT 


BOV 
000 


BCVINE 
DOQ 


PIQ 


PIQ 


RAB 


RABSTT 


8HP 


SHEEP 


CHK 


CHCKEN 


PSB 


FIBH (GROUP B) 


XB. 


XENOPUBIAEVO 


BMO 


BOMBYXMORi 


CO. 


CAENORHABOmS GLEQANB 


DO 


nCTVOOTEUUM DBCOffiEUM 


DRO 


DROOOPHILA MELANOQASTER 


PFA 


PLASMOOILM FALCIPARUM 


9CM 


8CMST0B0MA MANSOM 


9U8 


SEAURCHN 


TRB 


TRYPANOSOMA BRUCEI 


ASN 


ASPERQILLUB NnXJLANS 


ATH 


ARABIOOPSB THAUANA 


BLY 


BARLEY 


MZE 
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In studying beetle bioluminescence in the early 1960s, Or McElroy and his colleagues found 
that the Jamaican click beetle, Pyrophorus plagiophthalamus, was capable of emitting 
different colours of light. They further found that the luciferin substrate used by this beetle 
was the same as that in the firefly, demonstrating that the different colours of 
bioluminescence were due to differences in the structure of the luciferases. We have recently 
cloned cDNAs from this t>eetle s(>ecies which code for at least four different luciferases. The 
luciferases are distinguishable by their different colours of bioluminescence when expressed 
in Escherichia coli. The sequence differences between these different luciferases are few, so 
the amino acids responsible for the different colours of emission must also be few. Through 
the construction of hybrid luciferases, by rearranging fragments of the original cDNA clones, 
we have identified some of these amino acid determinants of colour. 

Keywords: Firefly luciferase; click beetle luciferases; bioluminescence spectra 



INTRODUCTION 

In 1963, a scientific expedition to Jamaica was led 
by William McElroy to study bioluminescence. 
There they encountered the 'kitty boo\ a local 
name for the large bioluminescent click beetle, 
Pyrophorus ptagiophthalamus. The beetle 
attracted the scientists' attention because of its 
ability to emit different colours of biolumines- 
cence, a feature not found among species of true 
fireflies (Seliger et aL, 1964). It has two sets of 
light organs, a pair on the dorsal surface of the 
head, and a single light organ in a cleft on the 
ventral surface of the abdomen. The dorsal 
organs emit greenish light, while the ventral 
organ usually emits yellow or orange light. But 
the differences in colour do not occur only within 
individual specimens. Considerable variation also 

* Author for correspondence. 



occurs between specimens for each set of organs. 
The colour emitted from the dorsal organs can 
range from green to yellow-green, and that of the 
ventral can range from green to orange. It was 
shown by these scientists that the differences in 
colour were not due to differences in the 
substrates of the luminescent reactions. The 
luciferases of these click beetle, or any other 
bioluminescent beetle, utilise ATP and the same 
luciferin that was first elucidated in the chemistry 
of the firefly luciferase. It was concluded that the 
different colours of bioluminescence were caused 
by subtle differences in the interaction of the 
substrates with the enzyme. Unfortunately, 
attempts to study these luciferases further were 
limited by the difficulties of collecting sufficient 
quantities of the beetles. Recently, we have been 
able to overcome this problem with the use of 
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molecular cloning techniques. cDNAs generated 
from the ventral light organ of this beetle have 
been cloned and expressed in Escherichia coli 
(Wood ei ai , 1988a, 1988b). These cDN A clones 
are of four different types, distinguishable by the 
colour of light emitted by the luciferases they 
code: green (546 nm), yellow-green (560 nm), 
yellow (578 nm), and orange (593 nm). These 
cDN A clones not only accord a practical source of 
the click beetle luciferases, but also provided a 
means of separating the different type since each 
clone is expressed in a separate host. Given here 
is the first report on the structural features of 
these click beetle luciferases that are responsible 
for the colour of light emitted. 



AMINO ACID SEQUENCES AND THE 
DETERMINANTS OF COLOUR 

Four cDNA clones, each capable of eliciting one 
of the four colours of bioluminescence in £. coli, 
were sequenced. With this information the amino 
acid sequences of the corresponding luciferases 
were deduced (Fig. 1). The open reading frames 
of each cDNA have a coding potential for 543 
amino acids, seven amino acids less than that of 
the firefly luciferase cDNA. From a lysate of E, 
coli expressing click beetle luciferase, Western 
blot analysis revealed the presence of a single 
antigenic band which comigrates with the native 
enzyme (Wood et ai, 1988b). Since it is unlikely 
that a post-transitional cleavage of the luciferase 
could occur at a common site in both beetles and 
bacteria, such cleavages probably do not occur in 
either host. However, the resolution of the blots 
is not sufficient to rule out deletions of less than 
about twenty amino acids. Other possible post- 
translational modifications to the luciferases are 
also unlikely to occur in either host for similar 
reasons. In the least, such modifications are 
limited to those that would not result in a 
significant alteration in the migration of the 
luciferases in SDS-polyacrylamide gel elec- 
trophoresis. The simplest explanation of the 
Western blot data is that the luciferases are 
expressed in either host directly as the mature 
enzymes from their mRNAs. 

Analogous observations have been made pre- 
viously in the comparison of native firefly 
luciferase with that produced from its cDNA in E. 
coli. In this case, it was also shown that firefly 
luciferase produced by in vitro translation of its 



mRNA also comigrates with native enzyme 
(Wood et qL, 1984), this further indicating the 
absence of substantial post-translational modi- 
fications. However, some form of modification at 
the N-terminus of the native enzyme is implicated 
by our inability to obtain an amino acid sequence 
by Edman degradation. It is not at present known 
whether the N-terminus of firefly luciferase 
expressed in £. coli is also modified. Neverthe- 
less, for both the firefly and click beetle 
luciferases, it is certain that there are no 
modifications required for enzymatic activity that 
are unique to the beetle metabolism, since the 
luciferases are active when expressed in exoge- 
nous hosts such as £, coli. Similarly, for the click 
beetle luciferases, variation in colour also is 
independent of possible beetle-specific modifica- 
tions. Given the general lack of similarity 
between beetle and E, coli metabolisms, it is 
unlikely that there are any essential modifications 
of any sort. Thus, we work from the premise that 
the effective structure of the luciferases is simply 
the folded polypeptide chains which are coded by 
the cDNA clones. 

Comparison of the amino acid sequences of the 
four luciferases shows a high degree of similarity. 
Pairwise comparisons of the sequences reveal 
from 94% to 99% identity (Fig. 2). This is in 
contrast to their comparison with the firefly 
luciferase sequence, with which they are less than 
50% identical (Wood et ai, 1988b). In terms of 
amino acids, the sequences differ from each other 
by 5 to 32 residues. This information can be used 
to construct an evolutionary tree of the luciferases 
(Fig. 3) (Fitch and Margoliash, 1967), which 
shows that the luciferases have evolved in the 
order of the colour they emit. The earliest of the 
click beetle luciferases is the green-emitting 
luciferase, and the latest is the orange-emitting 
luciferase. The amino acid differences between 
the click beetle luciferases are the only differ- 
ences between the E. coli capable of expressing 
the different colours of bioluminescence. There- 
fore, this set of amino acid differences must be 
responsible for the differences in the colour of 
bioluminescence. 

The effect that these amino acid differences 
have on the spectra of bioluminescence is very 
uniform. This can be demonstrated in a plot of 
the peak positions versus their respective widths 
at half maximal intensity for each of the four 
colours (Fig. 4). These parameters are presented 
in terms of wave numbers since this parameter is 
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Figure 1. Amino acid sequences of the four click beetle luciferases capable of ennitting four different colours. Shown in 
entirety is the sequence of the yellow-green-emitting luciferase. Above and below that sequence are shown only the amino 
acids at positions where the other luciferase sequences are different. Shown above the sequence are the amino acid 
differences for the green-emitting luciferase. Shown one line below the sequence are the amino acid differences for the 
yellow-emitting luciferase, and shown two lines below are the amino acid differences for the orange-emitting luciferase. A 
dash indicates no difference from the yellow-green emitting luciferase. Shown in bold type are the relative positions of 
restriction endonuclease site in the corresponding cDNA clones (the first BstXI site is not found in the green-emitting clone). 
The number on the right indicate the positions of the last amino acid in each line. 
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Figure 2. Pairwise comparisons of the sequences of the four click beetle luciferases. The percentage identities of the 
sequences are shown in the upper right; the number of amino acid differences between the sequences are shown in the lower 
left (shaded area) 
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Figure 3. Evolutionary tree of the click beetle luciferases. 
Distances on the tree branches are given in number of amino 
acid differences between the sequences. The root of the tree 
was determined by comparison with the firefly lucif erase 
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Figure 4. Plot of peak position versus peak width for each of 
the click beetle luciferases. Peak position is given as the 
wave number at which maximal intensity is found. Peak 
width is given as the width of the spectrum, in wavenumbers. 
found at half of the maximal intensity 



proportional to the energy of the photons 
emitted. In this form, the peak positions and 
widths are descriptive of the energy states of the 
light-emitting complex. The emitting complex 
refers to both the specific oxyluciferin 
fluorophore and its associations with the protein. 
The uniform manner in which these parameters 
vary suggests that discrete changes in the nature 
of the light-emitting complex are not effectuated 
to produce the alterations in colour. Hypotheti- 
cally, such discrete changes could be tautomeriza- 
tion or ionization of the emitter, or coupling of 
the emitter to other conjugated pi-bonded sys- 
tems. Instead the predictable variation in the 
peak shapes is symptomatic of a modifying effect 
on the light emitter that can be varied in a 
continuous manner. Such modifying effects could 
be general changes in the dielectric constant 
around the emitter, or alterations of the electro- 
nic environment around specific key atoms in the 
emitter complex. The fact that the spectra of the 
four colours are nearly evenly spaced should not 
be construed to be a natural quantization of the 
system. Spectra taken from the ventral organ of 
live click beetles reveal other colours of biolu- 
minescence, within the range of colours of the 
clones, which do not match any of the four 
colours from the clones (Biggley et al., 1967). 
Presumably there are other members of this 
highly conserved group of click beetle luciferases 
that have not yet been cloned, which are capable 
of emitting different colours. Among all biolu- 
minescent beetles there exists a large number of 
subtle shades in the colour of bioluminescence. 

It should be remembered that the number of 
amino acid differences between the four sequ- 
ences represents only an upper limit to the 
number required to produce the alteration in 
colour. As will be shown below, the number of 
amino acids that determine the colour of biolu- 
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minescence may be only a few. In the simplest a 
priori model, the colour would be determined by 
the chemical properties of four different amino 
acids found as a single position in all four 
luciferase sequences. Changes in the characteris- 
tics of the amino acids at this position « such as 
charge or hydrophobicity, would modify the 
colour of emission from one clone to the next in 
the uniform manner observed. However, with 
only the limited set of twenty amino acids 
available, it is difficult to find a uniform 
procession of chemical properties that could 
account for all four colours. The situation is worse 
when also included in the model are the other 
colours from the click beetle luciferases that have 
not yet been cloned. The fact is, comparison of 
the click beetle sequences reveals that there is no 
position in all four luciferases which changes to 
more than one other amino acid. Thus, as is 
typically the case, the role of individual amino 
acids to enzymatic activity is more subtle than the 
simplest model would predict, 

METHOD FOR CONSTRUCTION OF HYBRID 
LUCIFERASES AND SPECTRAL ANALYSIS 

Site-directed mutagenesis has become the stand- 
ard method for determining the function of 
specific amino acids within an enzyme. However, 
to be practical, the method requires knowledge of 
which amino acids to modify. In our case, there 
are a total of 31 site which are variant between the 
four luciferase sequences. To mutagenize each 
one would be very time*consuming and costly. As 
initial experiments, to reduce the number of 
options, we chose a method that would allow us 
to change several amino acids simultaneously 
(Fig. 5). Since the luciferase sequences vary by 
only a small amount, most of the restriction 
endonuclease cleavage sites in the corresponding 
cDNA sequences are conserved in all four clones. 
We are therefore aWe to express hybrid 
luciferases in £. coli by cleaving the cDNA clones 
with restriction endonucleases, and recombining 
the DNA fragments in new arrangements. Often, 
though we are exchanging large regions of 
sequence information, we are only changing the 
identities of a few amino acids. By analysing the 
pattern of amino acid substitutions for a large 
number of such rearrangement hybrids, we hoped 
to identify a subset of key residues that affect the 
colour of bioluminescence. For this type of 




Figure 5. Method of constructing hybrid luciferases. Luc- 
A(tac) is the fac^containing vector (upper left). Luc-B(BS) is 
the Bluescript-based vector (upper right). Each vector 
contair\s a luciferase cDNA (iudi and a penicillin resistance 
gene (amp). The hybrid luciferase cDNA is constructed by 
cutting each parent vector at the 'cleavage site*, and 
recombining the appropriate fragments 

analysis we assume that the variant amino acid 
position function independent of each other, and 
look for experimental evidence to the contrary. 
For the most part, this assumption has been 
consistent with our observations. 

To allow verification afterwards that the 
rearrangement hybrids we were constructing were 
indeed made properiy, we used a strategy 
involving two different types of plasmids. One 
plasmid is a small expression vector which 
contains the tac promoter. At present it is the 
vector which yields the greatest expression of 
bioluminescence. The other plasmid. Blue- 
script, was derived from the cloning vector with 
which the luciferase clones were originally iso- 
lated. This plasmid can be distinguished from the 
rac-containing vector by the presence of a unique 
Kpnl restriction site. The construction is per- 
formed by cleaving both plasmids, each contain- 
ing a different luciferase cDNA clone, into two 
fragments with a chosen set of restriction 
endonucleases. One fragment contains a portion 
of the luciferase cDNA and the entire plasmid 
vector. The other fragment is the remaining 
portion of the cDNA. A tac vector containing a 
hybrid cDNA is formed by ligating the vector- 
containing fragment from the parent tac vector, 
with its complementary fragment from the Blues- 
cript vector. The incorporation into the final 
product of the correct vector-containing frag- 
ment, with its associated portion of the luciferase 
cDNA, can be quickly confirmed by the absence 
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of the Kpnl site. The other fragment, which 
contains only luciferase sequence, must be 
confirmed by DNA sequencing since there are no 
unique restriction sites to identify one luciferase 
cDNA from another. The one exception to this is 
the green-emitting clone, which is lacking a BstXl 
site which is found in the other three cDNA 
clones. 

The spectra of bioluminescence from E. colt 
containing click beetle luciferases can be mea- 
sured either from intact cells or from cell lysates. 
It has been shown for each of the luciferases that 
the spectra measured by either method are 
identical (Wood et al, 1988b). Since it is 
technically simpler and less time-consuming, we 
chose to measure our spectra directly from living 
cells. The cells are grown on nitrocellulose filters 
on top of nutrient agar. To initiate biolumines- 
cence, the filters are removed from the agar and 
soaked with luciferin for 5 to 10 minutes as 
described (Wood and DeLuca, 1987). The filters 
are then blotted dry and placed in the spectro- 
meter for measurement. 

The spectrometer is a Fastie-Ebert type as 
described (Seliger et ai , 1964). The output of the 
photomultiplier tube is amplified and digitally 
converted for direct input into an IBM PC 
compatible computer. Each sample was scanned 
five times at a rate of approximately 8 nm/s over a 
total period of 5 minutes. Noise in the spectra was 
reduced by applying a digital curve-smoothing 
routine to the data. The set of five spectra was 
then corrected for the wavelength-dependence 
sensitivity of the photomultiplier tube, and for 
time-dependent variation in the intensity of the 
bioluminescent sample. The final spectra was 
then computed as the average of the five original 
spectra. The precision of this method varies, 
dependent on intensity of the sample. Except for 
very weak samples, there is a greater than 95% 
confidence that two spectra are different if their 
maxima differ by 2 nm or more. This confidence is 
greater for the more intense samples. For very 
weak samples, several sets of spectra may be 
averaged together to reduce the effects of noise. 



IDENTIFICATION OF SPECIFIC AMINO ACIDS 
WHICH AFFECT THE COLOUR OF 
BIOLUMINESCENCE 

By using different combinations of restriction 
enzymes on the cDN A clones, the four sequences 



were recombined to form four sets of rearrange- 
ment hybrids with five to eight members per set 
(Fig. 6). These hybrids, combined with the four 
original clones, form a total of 31 different 
luciferase sequences. Fig. 1 shows the relative 
positions of restriction sites which were used in 
the amino acid sequences of the luciferases. Also 
used in the constructions were two sites in the 
vectors near the ends of the cDNA inserts: 
BamHl at the 5'-end, and Xhol at the 3'-end. The 
first set of hybrids was constructed using Ncol and 
BamHl. This separates the last two variant amino 
acid positions from the remainder of the sequ- 
ences. The second set of hybrids was constructed 
using Apal and Xhol, which separates the first 
two variant positions from the remainder. The 
two BstXl sites, which separate approximately 
the central third of the variant positions, were 
used to construct the third set. This could not be 
done with the green-emitting clone because the 
first BstXI site was not present. This fact was used 
in the construction of the fourth set, which used 
the BstXI sites and the BamHl site to rearrange 
the final third of the variant position. For those 
clones which contained two BstXI sites, this 
resulted in cleavage of the fragment which 
contains only luciferase sequence into two. These 
fragments were combined and treated in the 
procedure as if they were only one, which was 
possible because the BstXI sites are not self- 
compatible. Thus, in the final ligation, the 
fragments can only recombine in one orientation. 

To simplify nomenclature, we refer to the 
green-emitting clone as luc-GR, the yellow- 
green-emitting clone as luc-YG, the yellow- 
emitting clone as luc-YE, and the orange-emitting 
clone as luc-OR. The hybrids are referred to as 
Muc-* followed by a number and a letter. The 
number refers to the set of hybrids which the 
clone is derived from, and the letter is an 
arbitrary distinction between the members of the 
set. Thus luc-2d is a hybrid that came from the 
second set of hybrids.. A group of amino acids 
substitutions are referred to as a set of changes, 
and are indicated as shown in this example: 
[R223,L23K— >E,V] indicates that arginine at posi- 
tion 223 changes to glutamate, and that leucine at 
position 238 changes to valine. The inverse set 
would be changes in the opposite direction, i.e. 
[G223.V23K-»R,L], The amino acid suubstitutions 
required to change one luciferase into another are 
indicated as shown: luc-GR-^2b [Vy,L2j— >I,K] 
indicates the necessary amino acid substitutions 
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Wild Type 

luc-GR GR 

luc-YG YG 

luc-YE YE 

luc-OR OR 



VL.LYEWIT.SVGRLDARVRVL.SDVIIRDSI.EI 546 nm 

IL.LFDSLA.NISKINTQIRAL.SDVIVGDSV.EK 560 nm 

IK. IFDSLA-NVSKINTQIEAV.SEIVIGDSI.EI 578 nm 

IK.IFDSUV.NISKINTQIEAV.GEIVIGVTI.KI 593 nm 



X Ncol, BamHI hybrids 

IUC-1a GR*YG VL, 

lUC-lb GR*OR VL. 

lUC-IC YG*GR IL, 

luc-1d VG*OR IL. 

Iuc-1e YE*YG IK, 

IUC-1f YE*OR IK, 

IUC-1g OR*GR IK. 

Iuc-1n OR*YG IK. 



LYEWIT . SVGRLDARVRVL. 
LYEWIT . SVGRLDARVRVL, 
LFDSLA.NISKINTQIRAL. 
LFDSLA.MISKINTQIRAL. 
IFDSLA . NVSKINTQXEAV , 
IFDSLA . NVSKINTQIEAV , 
IFDSLA. NISKINTQIEAV. 
IFDSLA . NISKINTQIEAV, 



SDVIIRDSI*EK 
SDVIIRDSI*KI 
SDVIVGDSV*EI 
SDVIVGDSV*KI 
SEIVIGDSI*EK 
SEIVIGDSI*KI 
GEIVIGVTI*EI 
GEIVIGVTI*EK 



546 nm 
546 nm 
559 nm 
561 nm 
579 nm 
578 nm 
593 nm 
593 nm 



X Apat, Xhol hybrids 

lUC-2a YG*GR 

IUC-2b OR*GR 

luc-2c GR*yG 

IUC-2d OR*YG 

luc-2e GR*YE 

IUC-2f YG*YE 

luc-2a GR*OR 

(UC-2R YG*OR 



IL*LYEWIT. 
IK*LYEWIT. 
VL*LFDSLA. 
IK*LFDSLA. 
VL*IFDSLA. 
IL*IFDSLA. 
VL*IFDSLA. 
IL*IFDSLA. 



SVGRLDARVRVL 
SVGRLDARVRVL 
NISKINTQIRAL 
NISKINTQIRAL 
NVSKINTQIEAV 
NVSKINTQIEAV 
NISKINTQIEAV 
NISKINTQIEAV 



SDVIIRDSI.EI 
,SDVIIRDSI*EI 
.SDVIVGDSV.EK 
. SDVIVGDSV.EK 
.SEIVIGDSI.EI 
.SEIVIGDSI.EI 
. GEIVIGVTI.KI 
.GEIVIGVTI.KI 



546 nm 
546 nm 
561 nm 
560 nm 
579 nm* 
578 nm 
593 nm* 
593 nm 



X BstXl hybrids 

IUC-3C OR*YG*OR 
YE*YG*YE 
YG*OR*YG 
YE*OR*YE 
YG*YE*YG 
OR*yE*OR 



luc-3d 
luc-3e 
!uc-3f 
Iuc-3 



)uc 



IK . IFDSLA*NISKINTQIRAL*GEIVIGVTI . KI 580 hm 

IK. IFDSLA*NISKINTQIRAL*SEIVIGDSI . EI 563 nm 

IL.LFDSLA*NISKINTQIEAV*SDVIVGDSV.EK 577 nm 

IK.IFDSLA*NISKINTQIEAV*SEIVIGDSI.EI 578 nm 

IL.LFDSLA*NVSKINTQIEAV*SDVIVGDSV.EK 577 nm 

IK.IFDSLA*NVSKINTQIEAV*GEIVIGVTI.KI 594 nm 



X BstXI, BamHI hybrids 

luC-4a GR*YG 

IUC-4b GR*YE 

1UC-4C GR*OR 

IUC-4d YG*GR 

IUC-4e OR*GR 



VL. LYEWIT. SVGRLDARVRVL*SDVIVGDSV.EK 550 nm 

VL. LYEWIT. SVGRLDARVRVL*SEIVIGDSI. EI 553 nm 

VL. LYEWIT. SVGRLDARVRVL*GEIVIGVTI.KI 571 nm 

IL,LFDSLA,NISK'INTQIRAL*SDVIIRDSI.EI 559 nm 

IK. IFDSLA. NISKINTQIEAV*SDVIIRDSI. EI 573 nm 



Figure 6. Sequence of hybrid luciferases. Each sex of hybrids is indicated in bold type by the restriction endonuciease sites 
used in its construction. Each line thereafter corresponds to a different hybrid within the set. The first entry of each line is the 
hybrid name. The second entry indicates the origin of the fragments used to construct the hybrid shown in their correct order. 
The third entry shows, in succession, the amino acids at every position in the four luciferase sequences where there is at least 
one difference between the sequences. The '.' in the series of amino acids indicate the positions of the restriction 
endonuciease sites. The restriction sites of actual u$e for the construction of a particular set of hybrids are indicated by The 
last entry of each line indicates the wavelength at which the intensity maxima occurs for the spectra of each hybrid luciferase. 
Maxima values marked with an asterisk are from spectra that were very weak, so that the exact value is somewhat uncertain 



required to change the green-emitting luciferase 
to hybrid 2b. 

The results of the hybrid experiments can be 
best understood in terms of deviations from the 
native enzyme sequences for each luciferase. We 
will begin by examining luc-YG, whose spectra is 
most like that of the firefly luciferase. Changes to 
both the first two and the last two variant position 
did not produce any change in the spectrum of 
bioluminescence. These changes are luc-YG— >2c 
[l9->V], luc-YG-^2d [L2i-^K], luc-YG-* Ic 
[K484-*!]. and luc-YG-^ld [E403,K484"^K,IJ. All 



of these changes produced luciferases whose 
spectral maxima were at 560 nm (within the limits 
experimental error). By combining these sets, we 
can form a new set, identified as luc-YG-^YG\ 
which describes changes in luc-YG that have little 
or no effect on its spectrum. This new set is 
luc-YG-. YG' [L>,Ut>E4,,3,K4H4-^V,K,K,I]. For 
three members of this set, the substituting amino 
acids have very different chemical properties. The 
change of E403 to K4(,3 is one of the only two 
substitutions possible between any of the 
luciferases where the charge on the amino acid 
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reverses. In luc-YG-^3e [R223»L238^E,V] the 
spectrum shifts from 560 nm to 577 nm, which is 
indistinguishable from the spectrum of luc-YE. 
Luc-YG-*3g [l89,R223.L238^V,E,V] has the 
same effect. The difference between these sets, 
[l89~>V], therefore apparently has no effect on 
the spectrum and can be added to the set of 
substitutions which do not affect luc-YG, i.e. 

luC-YG^YG' [I9,L2WI89>E403.K4K^V,K,V,K,I1. 

The difference between luc-YG->YE and 

)UC-YG^3e is [L21 X4I .l8y,D226,V2i*2l2«3.V323, 

V389,K484-^K,I,V,E,IV,I,I,I]. By subtracting luc- 
YG->YG' from this, the set [L4i,D226,V282 
l283jV323,V389-^I,EJV,IJ] is formed. It may be 
inferred indirectly that this set also has little effect 
on the spectra of luc-YG since the spectra of 
luc-YE and luc-3e are virtually identical. This can 
be tested directly by luc-YG-^ 3d, which after 
removing the members of luc-YG->YG' forms 
this set inferred to have little effect on the 
spectrum. The spectral maxima of hybrid luc-3d is 
563 nm, 3nm higher than luc-YG. Thus, while 
this verifies that this set has little effect on the 
spectrum, it does have a measurable effect. This 
also demonstrates that it is both a necessary and 
sufficient condition that the amino acid substitu- 
tions which produce most of the spectral shift 
between luc-YG and luc-YE are in the set 

[R223,L23«->E,V]. 

As was found for the luc-YG, changes to the 
first two and last two variant positions of luc-YE 
have no effect on its spectrum. Combining 
luc-YE-^le, luc-YE-* If, luc-YE-^2e, and luc- 
YE->2f, we get the set luc-YE^ YE' [l9,K2i, 
E403,l484-*V,L,K,K]. Luc-YE-*3f [WsT^l] also 
produces no shift in the spectrum and so, 
as with luc-YG, it can be included in luc-YE-» 
YE' to form [I9,K2i,Vh9,E K,K]. 

Since 1UC-YE->0R [V89,S247,D352,S358,E403-^ 

I,G,V,T,K] with the members of luc-YE->YE' 
removed is [S247,D352,S35(r-^G,V,T], the amino 
acid substitutions that shift the spectra of luc-YE 
to luc-OR must be in this subset. By reasoning 
similar to that used to identified the set which can 
shift the spectra of luc-YG to luc-YE, the inverse 
of this set must contain the amino acid substitu- 
tions which can shift the spectra of luc-YE to 
nearly to luc-YG, i.e. [E223,V238-^R,Lj shifts the 
spectra of luc-YE to 563 nm. 
The spectral shifts caused by 

[S247,D352,S358->G,V,T] and (R223,L238-^E,V] 

can be shown to act largely independently of each 
other by examining luc-YG— ^OR [L21 ,L4i , 



R223 y L238 J S247 ) D266 > V2821 283 J ^323 > D35 2 , S 35H , V3 j^g , 

E403,K484->K,I,E,V,G,E,IV,I,V,T,I,K,11. After 
removing the members of luc-YG-^YG', this 
set can be divided into three subsets: [R223,- 
L23«-^E,V], [S247,D352,S35K-^G,V,T], and 

[L4i,D2A6iV282l2K3» 

V323>V3«^I,E,IV,I,I]. The 
first set is the set of substitutions shown to be 
largely responsible for the spectral shift of luc-YG 
to luc-YE, and the second set are those responsi- 
ble for the spectral shift of luc-YE to luc-OR. The 
third set is the set shown to have a small but 
measurable effects on the spectra of luc-YG, 
Therefore, the spectral shift from luc-YG to 
luc-OR, 33 nm, appears to rely on both of the first 
two sets of effective substitutions. It can be shown 
that these two sets act independently by applying 
each to luc-YG. The effect of [R223.L238-^E,V] is 
already described by luc-YG->3e and produces a 
spectral shift of 17 nm. Luc-YG— >3c, after remov- 
ing members of luc-YG-* YG' and members of 
the set which has a small effect on the spectra of 
luc-YG, is the set [S247>D352,S358"^G,V,T]. Hy- 
brid 3c is shifted 20 nm from luc-YG. The sum 
of the effects of [R223iL238->E,V] and 
[S247,D352,S35fr-^G,V,T] is 37 nm, about 10% 
greater than the spectral shift of luc-YG-*OR. 
This is fairly good agreement, especially when 
acknowledging that luc-YG— >0R has substitu- 
tions which contribute small effects that are not 
in either [R223»L238-^E,V] or [8247,0352 Avsk-^ 
G,V,T]. Therefore, the effects of these two sets 
of mutations are largely additive. 

Determining which amino acids are responsible 
for the green emission of luc-GR is a more 
difficult problem because the set of changes which 
describes this luciferase is so large. For example 
luc-YG— >GR has 20 amino acid changes. The 
problem is compounded by the lack of one of the 
BstXI sites which have been so useful in analysing 
the other three luciferases. Hence, our analysis of 
luc-GR is so far only preliminary. The first two 
and last two variant positions have been found to 
have no affect on the spectrum of luc-GR, thus 
combining luc-GR— > la, luc-GR->lb, luc- 
GR— ►2a, and luc-GR— >2b, gives 
luc-GR-^GR' [Vc,,L2,,E4()3,l4H4-^l,K,K,K]. 
Luc-GR->4a, with luc-GR-»GR' removed, 
gives [l323,R35i,l389-*'V,G,V], which shifts the 
spectrum of luc-GR 4nm to 550 nm. Luc- 
GR— ►4b, with members of luc-GR->GR' re- 
moved, gives [D226»V2K2l2K3>R53|->E,IV,G] 

which shifts the spectrum to 553 nm. The 
difference between these two sets, i.e. 
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luc-4a^4b [D226,V282l283,V323,V389^E,IV,I,I], 

shifts the spectrum 3nm. This is a subset of a 
set identified earher to have a 3nm spectral 
shift in luc-YG->3d. Luc-GR-^4c, after 
removal of luc-GR— >GR', gives [S247,D266» 

V282l283,R351,D352,S35^G,E,IV,G,V,T] which 

shifts the spectrum 25 nm to 571 nm. This set 
can be divided into [D266,V282l283.R35i-^E,lV,G] 
and [S247,D352,S35g-^G,V,T]. The first set is the 
same as the effective set of Iuc-GR-^4b, which 
gives a 7 nm shift, and the second set is the effec- 
tive set of luc-YE-^OR, which gives a 16 nm 
shift. Thus, the sum of these effects, 23 nm, is 
nearly the same as their combined affect. These 
observations are consistent with the interpreta- 
tion that [l323,R35i J389-^V,G,V] Can effect a 4 nm 
shift in the spectrum independent of other amino 
acid changes. As an additional example, luc- 
4e— >3g, with removal of the amino acid changes 
shown to not affect the spectra, gives 
[L4i,l323,R35iJ3»9-*I,V,G,V], which causes a 
spectral shift 4nm, from 577 nm to 573 nm. 
However, [l323,R35i J389-^V,G,V] can also be 
formed from luc-4d->YG with members of 
luc-YG— >YG' removed. But in this case there is 
almost no change in the spectrum. 



SUMMARY AND CONCLUSIONS 

We have generated four different types of cDNA 
clones from the ventral light organ of Pyrophorus 
plagiophthalamus. These clones can direct the 
synthesis of luciferase in E. coli which are 
distinguishable by the colour of bioluminescence 
they emit. Since the different colours are express- 
ible in a bacterial host, they cannot be due to 
post-translational modifications that are unique 
to beetles. Further, because of the many differ- 
ences between prokaryotes and eukaryotes, it is 
unlikely that there are any post-translational 
modifications responsible for the different col- 
ours. Thus the determinants of colour must be 
found among the amino acid differences in the 
sequences of the four luciferases. We have begun 
to identify these determinants by constructing and 
analysing hybrid luciferases, made by recombin- 
ing fragments of the four different types. The 
results have shown that there are two groups of 
amino acids that each can produce a greater than 
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16 nm change in the spectrum of a luciferase. 
There are at least two other groups of amino acids 
that can cause smaller changes in the spectra, and 
several amino acid which have virtually no effect 
on the spectra. The two sets of amino acid can be 
expressed as directional substitutions, 

[R223,L23K->E,V] and [S247,D352,S35ff->G,V,T,], 

which in this form result in a large spectral shift 
towards longer wavelengths. It was shown that 
the effect of these two set of changes can act 
largely independent of each other. For the other 
sets of changes that cause smaller effects, the 
independence of their action is less clear. 

It is not known by this analysis whether ail 
amino acids in each set are required to effectuate 
a change. To determine this we are initiating 
site-directed mutagenesis to produce changes at 
only single sites. We anticipate that the amino 
acids producing the largest effect on the spectra 
will be those which have the largest changes in 
their chemical properties. Thus, in the set 
[R223,L23fr-*E>V], the effect is probably attribut- 
able mostly to R223-^E) which is a reversal of 
charge. In the set, [S247,D352»S35fr-^G,V,T], the 
effect is probably due to D352-^V, which changes 
a negative charge to a hydrophobic residue. But 
S247-^G may also be effective since it is a loss of 
hydrogen bonding. 
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All baalla )uclf«r«s«s hivt avotv^d from a common incattof : thay all uia ATP, Oj. and a 
common luctferin as subsuates. Tha moat anidlad of meaa lucirerasaa Is ihat ctarfvacf from Uia 
firefly Pfiotfnus pyrttts, a baatia in tha suparfam^y of Cantharotdaa. Tha aansftfWty with 
which tha activity of this anzyma can ba atsayad has mada It useful tn tha maasuramant of 
mimjta concantrations of ATP. With tha cloning of tht cDMA coding this luctfarasa, ft has also 
found wicJs application In mol«cular biolofy m ■ rvportvr g*fi«. Wv h«v« rvccntlY cfaned 
othar cONAs that coda for Uicfforasas from tha bloluminascant dielt baatla. ^Yrophona 
piMgiophthstmus, In the suparfamflv Elatarotdaa. Thasa nawly acquirad lucffcraaas are of at 
toast four different typas. dlstlngulshaWa by thair aMIHy to amrt different colours of 
biolumtrYvscence ranging from graan to oranga. Unique properties of these luctf erases, 
aspaclally thair emission of mtdtlpla colours, may make tham addhiorMrty useful in 
applications. 

K&ywords: Fireny lucifera»e; click beetle fucifor^ees; reporter genes 



INTRODUCTION 

Man's percepiion of chc world \s visually 
oriented. Since liiolumiiiescencc is one of the few 
things that can be seen in the dark, it is 
unJersliindahlt; ihnl this hus been ;i topic uf 
htDchcmical research for many decades. Fireflies 
have been promtneni in this research endeavour, 
in part because they are abundant and their light 
organs arc replete with lucifcrnse. Thus they 
provided a plentiful resource for further study. 
Early research on fireflies was done primarily to 
better understand this pectiliar phenomena of 
living light. Sometimes, though, the earliest work 
was justified as a means of developing artificiaJ 
lighting. In the lace iOMh, when the general 
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imponiince of ATP metabolism was just becom- 
ing recognised, it was discovered that ATP was a 
component in the luminescent reaction of fire- 
flies. The firutly Iwcif erase became one of the 
paradigms of A IP-utilizing en7ymeH. Because of 
the extreme sen^itiviry with which the activity of 
this enzyme could be iissayed. it was s<H>n adapted 
as a tool in the nieasurcmcni of very low 
ajnccntrations of A TP. Subsecjuently, lucifcrase 
was combined with other ATP-ulili/Jng enzymes 
to produce coupled enzymatic systems. In these 
systems, the lucifcrase was the reporter allowing 
sensitive measurements of a wide variety of 
metabolites. 

Recently, firefly lucifcrase has found new 
application as a reporter of genetic activity in 
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Kving cells. Its applkstion in this urea was midc 
possible by the cloning of its cDNA, which can 
direct the synthesis of active enzyme in foreign 
hosts (dc Wet et a/., 1985). In this report, we 
describe briefly Sow the properties of luctf erase 
have made it well suited for this purpose. We alio 
present information oo the recent doning of 
several new cONAs from a bioiumioescent cfick 
becile (Wood ei aL, 198fla). These cDNA clones 
encode for four different types of ludferases, 
which can be distinguished by their ability to emit 
different colours of biolumincscence. These col- 
oun range from green to orange. Sirucniralty, the 
cttck beetle luciferascs differ significantly from 
the firefly luciferase, and these differences arc 
reflected in their chemical properties. Because of 
this* the clicfc beetle Kjcirer4ise& may have 
additional features to make them useful as genetic 
reporter. 



PMPERTIES OF THE RREFLY LuaFERASE 

All beetle luciferases catalyse the conversion of 
chemical energy into light by a two-step process 
(Fig. IMScligcr and McElroy, 1964; DeLuca and 
McElroy, J978). This process utilizes ATP, O,, 
and beetle luciferin. a unique heterocyclic add 
found only in biohjmincsceni beetles. In the fint 
step, the carboxylaic group of luciferin is 
activated by acylation with the alpha-phosphate 
of ATP. The luciferyj adenylate as then oxidized 
wiih molecular oiygen. in the second step, to 
yield AMP. carbon dioxide, and oxyludferia. Tlie 
oiyiuctferin is generated in an electroniodlv 
exdtcd stale which, upon transition to the ground 
sure, emiu the photon characteristic of bio- 
lumincscence. For firefly ludferase, the most 
stutfed of the beetle lurifcrasei, the quantum 
yield for this reaction has been measured at 0.68 
relarive to the oofwumption of tudferin (McElroy 



and Seliger, I960). This is the highest yield 
reported for any luminescent reaction. 

Under optimal conditions the firefly ludfcrase 
emits light whose peak intensity is at 561 nm 
(yellow-grccn). This is the same as the light 
emitted from live fireflies. Under a variety of 
conditions, however, the structure of ludferasc 
can be altered to a form which emits predomi- 
nately at 617 nm (red) (Sctiger and McElroy, 
1964). Some conditions which can cause this 
spectral shift are pH below 7.5, temperature 
above 20 X. the presence of denaturants such as 
WM, and the preaencc of heavy mecals such as 
Zb?^, Cd»*, or Hg2* (Seliflcr and McElroy, 
1964). Some chemical modifications of the 
enzyme, or the use of substrate analogues, can 
abo cause the enzyme (o emit red light (DcLuca 
ef a/., 1973; Alter and Deluca. 1986). In the case 
of pH, the shift to red light is associated with a 
substantial decrease in the quantum yield of the 
reaction (McElroy and Scligcr, 1966). This 
decrease in quantum yield is probably evident 
under any condition that promotes the red- 
emitting form. The spectral shift associated whh 
changes in temperature* or the presence of 
denaturants, can bt interpreted as resulting from 
partial unfolding of the enzyme stnicturc. For 
others conditions, it is not known whether the 
effects are localized to key reactive residues » or 
whether they also cause general perturbations to 
the strxicture. Aside from the actual decrease in 
the quantum effidency of tudferase in the 
red-emitting form, the spectral shift also causes 
an apparent decrease in enzymatic activity. This is 
because photomultiplier tubes are generally much 
leai sensitive to red light than green light. Both 
thftsc real and apparent affects combine to give a 
large pll dependence to the measured tight 
tmtput of firefly ludferasc. The optimal pH for 
light output is pH 7.8. 

Under conditions of excess substrates, the light 
output of ludferase is proportional to the 
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concentratioD of enzym« over at kas< a 10,000* 
fold range. With the use of a sensitive lumiiio- 
meter, as tittle as 10 fcmto^ams of enzyme can 
be detected (10^ molecules). Initiation of the 
luminescent reaction by rapid mixing of the 
substrates with the enzyme results in a rapid 
release of light which reaches maximum intensity 
after about 0.3 seconds. The intensity is then 
quickly inhibited in a biphasic manner, reaching 
about 10% of \i% peak value after 30 seconds. 
Beyond one minute the inlcnsiiy ii produced 
from a steady-state process which decays slowly 
over several houre. Because the peak luminesc- 
ence is typically over 10-fold greater than the 
steady-stute luminescence, the cnzvniatic assay U 
most sensitive when the first 30 seconds of the 
reaction are included in the measurement of total 
light output. This is typically accomplished in a 
luminometer where the reaction is initiated in 
front of the phoiomuliiplier tube by injection of 
substrates directly into the sample holder. 

FireOy hjciferase is a kD enzyme which 
apparently is active us a monomer. It \n coded by 
a 550 amino acid reading frame in its mRNA, and 
is probably produced as an active enzyme without 
the necessity of post-translaiional modifications. 
In the firefly, the enzyme is located in specialised 
peroxisomes of the light organ (Keller et ai., 
1987). It IS directed to these sub-celhdar 
organelles by a targeting sequence at the C 
terminus of the protein (Gould and Subramani, 
1988b, 1988c). This targeting sequence is con- 
served throughout eukaiyotes* and will cause the 
lucrferase to localize in the peroxisomes of other 
organisms when expreitRed in exogenous hosts. 



APfUCATlON OF mBFLf LUaFERASE A8 A 
REPORTER GENE 

With the past decade have come dramatic 
advance(T>ent8 in our ability to manipulate genetic 
materials. Eiubied by these new techniques, the 
study of sub-cellular events which regulate 
genetic activity has become one of the largest 
areas of research today. A key tool in this area of 
research is the reporter gene, which provides an 
observable parameter in the monitoring of 
genetic events at a molecular level. In its .simplest 
form, a reporter gene is a fragment of DNA 
which encodes for an easily detectable protein. 
This protein is the reporter. In experiments, the 
reporter gene is linked to other fragments of 



DNA which are thought to contain genetic 
control elements, and the assemblage ia intro- 
duced mto Uving cells. Produciion of the reporter 
in the cell is rcnilated by the aaion of the control 
elements on the transcriptional activity of the 
reporter gene. Thus, the reporter is the observ- 
able parameter allowing the experimenter to 
monitor the action of the control elements. 

In practice, the transcriptional activity of a 
reporter gene can be quite low. and experiments 
are often limited by an inability to detect the 
reporter. Therefore, for a reporter to be widely 
useful, it must be detectable in very low 
Goocentrations. In addition, die reporter must be 
deteaable by a method that can distinguish it 
from other proteins native to the host cell. Firefly 
ludferase meets these criteria ideally. It can be 
detected in very small amounts through its 
biolumninescent activity, and since biolumtnesc- 
ence ti not a common event in living systems, in 
activity will be unique in the expcrimenul host. 
That is, there is no endogenous tuminesccnt 
activity of the host to interfere with the detection 
of even miniscule amiiunts of ludferase. The 
baaerial enzyme, chloramphenicol acetyltrans- 
fcrase (CAT), has been used conventionally as a 
reporter in eukaryobc systems for similar reaaons. 
Its enzymatic activity is not found in eukaryotic 
cells, so CAT can also be delected without 
confusion from host actitivities. Its assay is based 
on eonverrion of the antibinltc chlnrampH«niool 
to mono- and di-acecylated forms. High sensitiv. 
ity is provided by the use of '^C-labelled 
chloramphenicol as the substrate. This method 
reqairet that the products of the reaction be 
separated from the substrate before quantifica* 
tion, usually by thin layer chromatography or 
HPLC. 

Because CAT is widely used as a genetic 
reporter, it was used as a benchmark to evaluate 
the suitability of firefly ludferase in this applica- 
tion (de Wet tt d/., 1987). It was found that the 
leveb of expression of CAT and ludferase in 
eukaryotic systems were comparable. It bad been 
previously shown that the production of CAT in 
eukaryotic cells is proportional to mRNA Irans- 
atption from the reporter gene. Since luciferase 
pctxhidion paralleled CAT pnxluction under a 
variety of experimental conditions, ludferase 
must also be a proportional indicator of transcrip- 
tional activity. However, because of the effident 
detection methods achicvaMe with biolumineAC- 
ence, the assay of ludferase is 100 to lUOO times 
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more sensitive. Thus, much lower levels of 
genetic activity arc dctcciable- Furthermore » the 
time required to assay luciferase is much less than 
CAT. Using a luminometcr or scintillation 
counter, the lucifcra&e a.vsay requires about a 
minute per sample. The CAT assay usually 
requires several hours. The assay of tudferase 
also does not require the special precautions 
needed for radioaaivc **C. 

One of the unique advantages of firefly 
lucifcrase as a reporter of genetic activity is tta 
potential to measure this activity from within 
living cells. This is not possible with use of CAT 
since the products of the reaction require 
separation from the assay mixture in order to be 
quantified. The photons produced in the 
luciferase reaction, however, are generally able to 
pass from within the host cell to iillow external 
deteciion. A precondition of thii ht that the 
lucifcrin substrate be able to pass into the cell to 
combine with the luciferase reporter. The other 
substrates of the reaction, ATP and O2. are 
readily available in the interior of the cell. The 
mere addition of lucfferin to (he external metila fs 
sufficient to allow its passage across the vellular 
membrane. But the light output elicited by this 
method b less than expected given the extent of 
luciferase contained within the cells. Light output 
can be increased with the use of permiablbung 
agents such as DMSO or nigericin (GouJd ami 
Subramani. 1988a). but stilt not to the full 
potential expected. It is net kmiwn whether 
permeability of the outer memhrunc is the only 
limitation, or whether there are other inhihiton 
of activity. One possibility is that the peroxisomal 
membrunc acts hs u second httrner to lueifcrin 
passage. Since tuciferase is localized into peroxi- 
xomcjt, most of the luminescent activity may arise 
from these organellev. Experiments Hre currently 
under way to remove the peroxisomal targeting 
signal from luciferase so that it will remain in the 
cytophtsm. This may improve its ability to elicit 
luminescence from vt^ithin intact cells. However, 
other possibilities exist, such as unfavourable 
microenvironmental effects, which could inhibit 
the aaivity of luciferase in a foreign host. 

Since the first published reports of Its use as a 
genetic reptirter. this new application of firefly 
luciferase has received much interest. By the time 
this article was written, we had received approx* 
imately )00(> requests for the cDNA encoding 
ludferase from other laboratories wishing to 
apply it to their experimental systems. The 
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feedback from these other laboratories has been 
quite positive. In most cases, researchers are 
finding thai the use of luciferase instead of CAT is 
saving much time in the execution of their 
experiments. The time saved is not only in the 
much shorter assay time of luciferase, but also in 
the lime required for sufficieiit expression of the 
reporter. Previously, produetion of the reporter 
often was not detectable for 24 to 48 hours after 
the reporter gene was introduced into cells. 
Because of the much higher sensitivity of the 
luciferase assay, expreuion of the reporter gene 
can typically be measured after only a few hours. 
In some cases, where expression of the reporter 
was previously too low for detection under any 
conditions, the use of hiciferasc has allowed 
measurements to be made. To dale, luciferase has 
been expressed from its cDNA in almost every 
living kingdom. It has been expressed in bacteria 
(de Wet etat., I9R5], yeast (Wood and DeLuca. 
1987), dictyostelium (Howard €( ai., 1988), 
mammalian cells (de Wet et al., 1987, Gould and 
Subramani. 1988a). and plant cells (Ow et a/., 
1986), as well as In transgenic mice (DiLella er 0/., 
1988; Crenshaw and Rosenfeld, 1988) and planu 
(Own a/., 1986). 



COMPARISON OF fIREFLY AND CUCK 
BEETLE LUCIFERA$E8 

Biolumincscent beetles arc round in two super- 
families. Elateroidea and Cintharnidea (Lloyds 
1978). Fireflies are member* of the family 
Lampyridac in the supcrfumily Canthurnidea; as 
indicAted above, they have been the primary 
source of a beetle luciferase because of their 
abundance and accessibility. In the supcrfamily 
Elntcroidca, only the family Elateridae contains 
biolumincscenc members, which are more com* 
monly known as click beetles. This family of 
beetles ts one of the most widely distributed, with 
spedes found in must areas of the world. 
However, unlike Lampyridae. where nearly all of 
the members are bioluminesccnt, only a small 
percentage of Elateridae are so. Most of these are 
located in the Caribbean and in South America. 
Their taxomony suggests that the dick beetles arc 
the most evolutlonarily distant of the biolumines- 
cent beetles from the fireflies (Crowson. 1981). 
The time of divergence of the Elateniidca and 
Caatharoidea superfamiltes cannot be estimated 
directly owing to a lack of fossils. But by 



BNSDOCIO; <XP 90696aA_l_> 



BEETLE LUCIFERASES AND THEIR APPLICATIONS 



293 



compwhson of the morphological differences 
between ihcsc group* of beetles, corroborated 
with the fossil record of other beetles, it has been 
estimated that these superfamilics diverged Rbout 
}2i> million ycarR ago. 

Morphologically the click beetles and fireflies 
are quite distinct (Fig. 2). The click beetle* have a 
hnrd exoskeleton. und are often larger than the 
fireflies. They c;in be recognised by a characteris- 
tic behaviour they display when being constrained 
or placed on their backs. They make an audible 
'click* sound by forcibly arching their head 
forward. BioluTninescenl click beetles have two 
sets of light organs. One pair is located on the 
dorsal surface of the head. These light 
organs emit long pulses of light when the beetles 
are not in flight. The second .set is a single organ 
located in a cleft on the ventral surface of the 
beetle between the mcsoihornx und abdomen. 
1liis light organ also emits long pulses of light but 
only when the beetle is in flight. On the ground 
the cleft is closed and the light is extinguished. 
For most species of biolumincsceni click beetle, 
the ventral organ emits light ut a longer 
wuvctcngth than the dorsal orpn. The position 
and activity of the light organs m fireflies is quite 
different. These 1>celles have one set of light 
organs locaied on the ventral surface of the 
abdomen, on the posterior stcrnites. They gener- 



ally emit short burst of light in a pattern which is 
indicative of the particular species. 

One species of click beetle has been of 
particular interest since \u btolumincscence was 
firvt studied in t%3. 'litis speciM. Pyropkorus 
ptagiophthalamus, is an espedally large click - 
beetle being typically 3cm in length. It was of 
interest because its hioluminesccnce presents an 
unusually large range of colorus (Seliger ct al.^ 
1964). Furthermore, the colours vary between 
individuals, a property not found in fireflies. The 
light of the dorsal organ Is grecnhth in colour, but 
varies between individual« from green (54Rnm) to 
ycllow-grccn (.^65 nm). The ventral organ varies 
over a much wider range, from green (.'547 nm) to 
orange (594nm) (Bigglcy tt ai., 1967). The 
luciferases of these beetles were extracted to 
determine the source of these different colours. 
In extracts, the biolumine.scence spectra were not 
different from those of the living hectics. Analysis 
showed that the different u>lours were not due to 
alterations of the substrates of the reaction, which 
arc the same a.s utilised by the firefly lucilcrase. It 
was concluded thai the differences were due to 
vurimion in the interaction of the sulntrutes with 
the luciferases (.Seliger and McElroy. 1964). 
Unfortunately, attempts to study these lucifer;|ses 
further were limited by the difficulty of collecting 
sufFiciei>l quantities of the beetles. 




F»9ure 2. TIh; gcinof.V irmp^iyfuyy lirrllkss nnd c -ck Ijcctlrri. A fimffy (Lampyridacl is showr. *n two v.(sws on !h#* t»*M. »piI *i 
cliCK bCHiIrt (f i.'iirndo^) 'S st>9wn on ihc :i(}h{ 
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a.ONINQ AND EXPRESSION OF CDNAS 
ENCODING CUCK BEETLE LUQFERASES 

Expression of firefly luciferase in Escherichia coii 
demonstrated that we coutd produce thb enzyme 
from a source which was easily grown in the 
laboratory. Thus, in this case, the information 
contained in the cDNA encoding firefly luciferase 
was in itself sufficient to generate an active 
enzyme in a foreign host. Applicatioo of this 
technology to the click beetle ludferases could 
circumvent the problems of collecting large 
quantities of the beetles. The methods used in 
cloning a cDNA which encodes luciferase require 
only a small supply of the beetles, and they are 
needed only once. Afterwards, bacterial ho»ts 
generate the DNA and enzyme needed for 
further study. Production of the click beette 
ludfera.%es from cDNA clones has the additional 
advantage that genetic variants of the enzyme, 
such as those which produce the different colours 
of biolumtnescence, are generated in botttcioo of 
one another. Enzymes isolated from the dick 
beetles directly would require methods for 
separation of the different variants. This would be 
difricult since, At was subsequently found, the 
physical differences between these variants are 
few. Furthermore, the amino acid sequences of 
the tuciferases can be determined from the DNA 
seqticnces of their cDNA donc&. DNA sequenc- 
ing is an effident technique making it praaicabJe 
to determine the amino acid sequence differences 
between several proteins of over 500 amino adds 
each. 

Specimens of Fyrophorus piagiophthaiamus, 
collected from the northeast end of Jamaica, were 
transported live to the laboratory and frozen in 
liquid N). Messenger RNA was isolated from 
ventral tight organs of approximately 60 beedes, 
one microgram of which was converted to cDNA 
(de Wet et at., 19K6). This was packaged in a 
spedatized lambda cloning vector. Lambda ZAP, 
to yield 700,000 recombinant plaques (Fig. 3) 
(Short era/.. 1988). We had originally intended to 
screen the library by DNA hybridization using the 
cDNA sequence of firefly ludferase. However, 
attempts at visualizing the click t>eetle ludferase 
gene in Southern blots, using the firefly luciferase 
cDNA as the probe, failed to demonstrate 
cross-hybridization even under conditions of low 
stringency. Jt had been previously shown that 
antibodies raised against firefly luciferase can 
cruss-react with the click beetle tudferascs 




ngura J. Th« stratBgv \of cloning and •xpressino the cOhJAa 
coding die* beett* iucif»ra»«s. The uv)per comer depicts 
dM iMDfadft ZAP vcxof Th» cONA lifarwy in this form wt9 
saMnad for the expreaston of antigenic poM>^tde$ (Ag). 
With the use of M13 helper, Umbde ZAP w« transformed 
•rtNr indrvidueJIy or en the into Bluescnpt plismids (uppw 
nghtl. Thi cONA fibranr in this form was screened for the 
uprttskmor limiin«scenc« (M- Th« oxpr«S3*on of lumir^eac- 
enc« WW improvod by transforring tfie cONA ctorws to e 
vector oonuining the oe piomotef (lower left end bononn) 



(Wlenhanscn and DeLuca. 1985). Thus we used 
such antibodies, raited in rabbits, to screen the 
cDNA library. Because the cross-reactivity widi 
dick beetle iucifcrases is weak, we used anti- 
bodies that had been affinity purified by selection 
on • Sepharose colunui containing immobilized 
firefly ludferase (a gifl from Dr Gilbert KcUer; 
Keller e/ a/.. 

The original screening was done on unampli- 
fied aliquots of the lambda cDNA library. 
Determined by the colohmeiric detection of 
alkaline phosphatase conjugated to anti-rabbit 
anzihodies. 5.5% of the recombinant lambda 
phage expressed luciferase antigens. Eighteen 
clones were chosen for further analysis. A unique 
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feature of the Lambda ZAP cloning vector is that 
it can be transformed into a bacterial expression 
plasmid (Btuescript) by an in vivo procesi 
involving the addition of an MI3 helper phage 
(Short et al,, 1988). The recombinant lambda 
containing the four longest cDNA clones were 
transformed into their plasmid form, and found to 
be able to express bioluminescence in E. coll U 
could be visually observed from the expression of 
the four clones in £. coli that two produced 
orange light, one produced yellow light, and one 
produced yellow-green light. 

In order to ascertain whether other coloure of 
biolumincscence could also be found in the 
library, it was rcscrccncd for other full*lengih 
cDNA clones. The rescrecning was done by a 
different method designed to identify luminescent 
activity directly. Five altquots of the ong;inal 
library were amplified, then transformed en bloc 
into expression piasmids. As in the case of 
eukaryotic celt expressing the firefly ludferase 
(see above), bioluminescencc can be initiated in 
£. coii cxprcMing luciferaAC by the addition of 
luciferin to the media (Wood and Del.uca. 1987). 
In bacteria, the diffusion of luciferin through the 
membranes can be facilitated by redudng the pK 
of the media to 5. Presumably this masks negative 
charges on the molecule^ making it more hyd- 
rophobic and permeable to a lipid bilayer. By 
adding luciferin to bacterial colonies containing 
clones of the cDNA library, colonics able to 
express a functional luciferase were identified 
directly by their ability to darken X-ray film. 
Several biolumincsccnt colonies were isolated 
from each ali<|uot of the library, seven of which 
were identified as arising from independent 
cDNA clones. From two of the allquots, two 
colonies could be judged as resulting from 
independent clones based on widely different 
inunsities. The independeiKx of these clones was 
later confirmed by restriction mapping. From 
these clones, five omit yellow light* one emits 
orange light, and one emits a new colour, green. 

Western blot analysis was performed to con- 
firm that full-length dick beetle lucif erases were 
being expressed tn the F.. eoti. Despite the fad 
that some of these clones were dearly viMjalized 
by anti-firefly hiciferase antibody during the 
library screening, we were unable to detect the 
gene products in blots made directly with £. coii 
lysates. This is the result of both a low level of 
gene expression, and a weak ctoss-reactivity with 
the antibody. The expression of luminescence was 



increased by transferring the cDNA clones to a 
plasmid vector which incorporated a tac promotor 
(Fig. 3). A lysate from E. coii expressing the 
green-emitting luciferase from the tac vector 
further required partial puritication to be detect- 
able in a blot. The blot revealed a single band, 
cross-reactive with firefly luciferase, which comi- 
grates with the native dick beetle luciferase (Fig. 
4). DNA sequence analysis was later peiformed 




1 2 3 

Rgura 4v Wsitern bbt showing (tie expression of dick 
t>atit« ludftnse in €. coH. Lant i . ptrtiatty purified extract 
£ 00& Qxprosstng th* on>on«nnining lucif erase Una 7: 
extract* of dicfc t)Mlle hoht organ. L^ne 3: purifiod fimftv 
lucifertto. Ludterases were detected with anii*trrefty 
luciferase 
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for one clone of each colour. This confirircd that 
each cDNA contained an open reading frame 
which could code for a protein whoTs^ N-terminiw 
corresponded to the N-torminus of fircrty 
luciferttsc. Thus, os has been achieved previously 
with the firefly lucifcrase, the click beetle 
iLCiferascs can be produced in £. coii as 
fulHcngth and en/.ymatically active enzymes. 



BtOLUMINCSCENCC SPECTWA OF CLICK 
BErOEtUCIFERASES 

Spectrographic analysis was performed on ihc 
biolumincsccnce emitted from E- coii expressing 
the various cDNA clones. Biolumincsccnce ^as 
induced from whole cells by the same method 
used previously in the screening of bacterial 
colonies for luminescence. Cells producing 
lucifcrase from the tac vector yielded sufficient 
li^il intensity, upon addition of lucifcrin to the 
media, to allow spectral measurements. These 
me;isurements verified rhe visual observation that 
the eleven clones can be sorted into four groups 
based on the colour of light cmilied. For each of 
the colours, the spccinim is a single peak 
qualitatively similar to the spectra of native click 
beetle lucifcrusc (Scligcr et ai., 1964). When the 
spectra of the four colours are superimposed, 
they show a remarkable pattern of four similarly 
shaped peaks that are nearly evenly spacetl (Fig. 
5). The wavelengths of maximum intensity arc 
M6nm for green. 560 nm for yellow-green, 
57H nm for ycltow, and 594 nm for orange. 

Spectra were also measured from lysates of the 
E, cod dlitr partial purincaiion (Wood et at.. 




Hgur* I. Sp«ctr» d* biolum:nesc»nrn fimiuod tmrr t. cnh 
cel^ COn(i»inJr>g Uvj clr.k txfutic iuiatBrj:»v5 The tntena ty 
fii*vinn'jm fix ejK rt •;i>ocif jn hpii ho«;n nornrwlied 



1988a) (Kig. 6). Bioluminescencc was elicited 
from the lysates by diluting them 100-fold into a 
reaction miJiturc ranging in pH from 6 to 10. For 
pHs 6.0, 7.0« and 8.0, the reaction mixture was 
buffered with 50mmol/l MES/50fnmol/l MOPS/ 
50mmol/l Tricinc. For pHs 8.D, 9.0. and 10.0 it 

wa? buffered with 50mmol/l Tridnc/SOmmol/l 
CHES. Also in the reaction mixture were 
5 mmol/l MgSO*/! mmol/1 EDTA/0. 1 mrool/1 
lucifcrin/l,5mmol/l ATP/1 mmol/l NaF/0.2mg/ml 
BSA/10% glycerol. (NaP was found to simplify 
the kinetics of the decay of lumincjwrcnce, which 
simplified the analysis of the spectral data. It does 
not affect the spectral disiribudon. It is not 
known whether tt affects the activity of the click 
beetle lucifcrase directly, or whether it is due to 
an interaction with other components of the 
lysate. It has no effect on the purified firefly 
lucifcrase.) For the click beetle lucifcrases from 
each of the four colours, the spectra measured 
from whole cells marched thai of the lysates at pH 
6.0 and pH 7.0. Also, for each of the lucifcrases, 
the spectra shifted towards longer wavelengths at 
pH above 9.0. This shift was largest for the 
grcen-cmilting lucifcrase. less for the yellow- 
emitting lucifcrase. and the least for the yellow- 
green and orange-emitting lucifcrase. At pH 8.0. 
this shift is virtually undetectable for the yellow- 
green and orange-emitting luciferases. For the 
green and yellow-einilting lucifcrases. the shift at 
pH H.O can be detected as a slight widening of the 
spectral peak, but the position of the maxima is 
unchanged. 

This pH response of the click beetle luciferuses 
is in contrast with that of the firefly tuciferasc. As 
slated above, the speanim of firefly luclferave 
.Hhifis to longer wavelengths at low pi 1 (Fig. 7). In 
the pH range of H.O to lO.U. Ihc enzyme emits its 
chsractcristic yellow-green light. Tlie spectrum 
shifts tuwurd* longer wavclcogths at pH 7.0. und 
at pH 6.0 is generated almost completely from a 
red-light emitting form of the enzyme. Iliis shift 
is much larger than is seen with the click beetle 
luctfe rases. At pH 7.0, where the spectrum of 
firefly lucifcrase is a mixture of yellow-green and 
red-emitting fomis of the en/ymc. a diffcreticc is 
apparent between the enzyme purified from 
fireOicA and that produced in £. colt (Fig. 7). 
With the lucifcrusc from L*. coH, the red 
component of the spectrum is much less than for 
the purified native enzyme. In addition, us the 
light output of the ruuction decnyic. the Iwii 
components of the spectrum do not decay al Ihc 
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Hgw* f. Sptctn nnusuned from pornaHy purilitd l/utM of 
£ co/t •uprmssinq ih» dick b«««l* tucffe»s«s. Th« imansity 
maximum for oach spacirum his been normidt/«d. Thtt 
colour emtiiod by each Kjcrferast at neutral pH i% indicated in 
(he cornpr of ench pkM Speoirn shown fof pH 8.0 iind B 0 
wer« measured m MES^'MOPVTrione DuHer. Spectra shov^ 
Ic pHd.Oand 10 0 wowmwAsurmlin TnanoCMf S huffsr. Ac 
pH 8.0. the spectrum measured m MES/MOPS'TricitKi wos 
Klenrtcat to ihst rneasurod in Tnc^ne/CHES 



same rate. The decay rate of these two compo- 
nenij ts iKc »amc mtc in the spectrum of che 
native luciferase. The .spectra of the ludferase 
produced in E. coii are of samples that arc only 
partially purified by the method stated *bovc. U 
can be shown that the differences between this 
and the native ludferasc arc not due to intrinsic 
differences in the enzymes themselves, but 
instead arise from the effects of the other 
components in the bacterial lyitate. If the native 
Juciferase is mixed with a lysate prepared from £. 
coli which does not a^ntain a luctferase cDNA 
clone* the spectrum of the mixture is the same as 
that of tynttea containing the lucifcnue produced 
in £. coii. 

These observations reveal two aspects of the 
effects of an £. coii lysate on the spectrum of 
firefly lociferase. One is that th« lyute contains a 
component that causes luciferasc to resist the 
effects nf pH on its spectrum, llie other feature is 




Rgur* 7. Spectra maasumd from Qurif«d native ffreftv 
hidfanse (upper pM and horn partially purlM lysates of £ 
ooie)9r«*«tng the ftratVy lucf«rjiM {lower pkn) The rfUensily 
maitirmjm for each spectrum hex been normalized. Spectr«f 
shovvn for pH 6.0 and 7.0 wore measured m MtS^VOPS/ 
Tricine buffer. Spectra shuwn lor pH 8.0 ar:d lO.O wete 
measured tn 1rc*nei;'CH£S buffer. At pH 6 0;. ihe .sKtmrurr) 
measurvd iit MJiS%K)P^rictna had a 3i*9hthf preMi«f 
contribution from the md oomponent than that measured n 
TiicifMi'CHES 
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that firefly ludf erase, in the presence of Che 
lysate, is io at least cwo different forms disting- 
uished both by the colour of light emitted, and by 
different decay rates of the light output. Tliese 
two features may indicate a single phenomena. 
That is, a factor in the tysates may be stabilizing 
some oif the luciferase molecules both to destabi- 
lizing effects of low pH. and to (he temporal loss 
of enzymatic activity. The presence of the 
baclcnal tysate does not appear to affbct the 
»pcctra) distribution of each of the component of 
the firefly luciferase spectrum, just their relative 
contribution to the total spectrum. While Ibis 
effect is most evident when the spectrum b 
measured at pH 7.0, it is al»o evident «t pHs 6.0 
and 6.0. In these cases, however, the differertccs 
are slight since the spectrum consiiu almost 
entirely of a single component. Extrapolation of 

these results to the speetra of the dick beetle 

luciferases incidate that their spectral Jiitribu- 
tions in the pH range of 6 lo 8 are probably not 
affected by the lysate. This is true since the 
spectrum of these luciferatcs ii apparently a 
single component in thb pH range. But the pH 
required lo shift the spectra to longer wavelength 
is potentially different than what would be 
expected for purified enzymes. However, the 
spectra of the green-emitting click beetle 
luciferase at pH 9.0 or 10.0, which also consists of 
two components, did not reveal the nonuniform 
decay rale evident with the firefly luciferase at pH 
7,0. 



SEQUENCE COMPARISON OF CLICK BEETLE 
AND RREFLY LUaPERASES 

Our inability to demonstrate cross-hybridization 
of their corresponding nucleic acid in Southern 
bk>ts suggested thai a significant degree of evolu- 
tionary divergence had occurred between the 
firefly and click beetle ludferases. Sequence 
analysis of the click beetle cDNA dooet has 
confirmed rhis. For a dlrea comparison with die 
firefly luciferase, the yellow-green-ennitting dick 
beetle tudferase was used since its spectral 
maximum is at nearly the same wavelength. The 
cDNA eaooding this ludferaic contaios an open 
reading frame correspoodiag to 543 amino acids. 
This b seven amino adds less than that fouod with 
the firefly luciferase cDNA. Alignment of the 

amino add se<|uences, deduced from the cDNA 
sequences, reveals a 47% identity between the 
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two ludferases (Fig. 8). The difference in the 
number of amino acids between the sequences is 
moslly ticcuunicd for by six gaps in the sequence 
alignment. These gaps are small, being one or two 
amino acid in length and. for some, their exact 
position is somewhat arbitrary. 

Throughout the alignment there arc no regions 
of espedslly high sequence similarity. Thus there 
is no indication of which re^ons may have been 
conserved owing to catalytic or structural am- 
strainb on the enzymes. An exception to this b in 
the last three amino acids which, for the firefly 
ludferase, have been shown to be necessary for 
translocation into peroxisomes (Gould and Sub- 
ramani, t9A8b, ]9H8c). Given the close functional 
similarity of these enzymes, it is almost certain 
thai the click beetle luciferases are also located in 
peroxisomes, llie hydropathy plots of firefly and 
dick beetle luciferaxe shown some similariliet, 
but overall appear to he quite different (Fig. 9) 
(Kyte and Doollttle, 1982). The moii apparent 
similarity is the three large hydrophobic regions 
found in both luciferases. Other regions of 
hydroplK»bidly and hydmphilicity can be found in 
common between the hKif erases, but they arc 
largely obscured in as many differences. 

In contrast to the low degree of similarity 
between the firefly and dick beetle ludferases, 
the similarity between the various dick beetle 
ludferases is very high. Between the ludferases 
which are capable of emitting different cc^ours, 
the amino add sequences are from 95% to 99% 
identical (Wood et al,, 1988b). Since the only 
difference between these lucifcra.ses are the 
amino add sequences, the determinanU of colour 
must be found in the relatively few differences 
between the sequences. We have t>cgttn to 
examine exactly which of the amino acids can 
affect the colour of lights and have found that not 
ill of the differences between the clones are 
effective. In some cases, the amino acid dctermi- 
nants of colour may be as few as two or three. 
This work will be presented elsewhere. 



USE OF CUCX BEETLE LUDFERASES AS 
OENETIC REPORTERS 

II appeaa that the click beetle ludferases will 
have all the advantages of the firefly tudferase in 
their application as reporters of genetic activity. 

Advantages such as the ftcniutjvity with which 
tftey can be detected, or the ability to detect them 
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in living cells, arc evident. Other features of the 
hjciferases, such as their specific activity, or the 
linearity of their assay with respect to enzyme 
concentration, have yet to be established. We are 
ai present improving the expression of these 
luciferases in E. coii to provide a source of 
enzyme for better characlerizatton. \t may be 
expected with the large difference in sequence 
between the click beetle and firefly lurifcrates. 
that these luciferases also have significant differ- 
ences in their chemical properties as well. This is 
uipported by the dramatic difference in the 
response of their speara to changes in pH. By 
qualitative observatioAi temperatures above 40 "C 
or the presence of Zn'^ do not cause changes in 
the spectra of the dick beetle luciferases. As 
noted above, these conditions will cause the 
firefly luciferase to emit red light. In fact, the 
icmperaiaure optimum for the dick beetle 
luciferases may be higher than for the firefly 
lucifcrasc. Other initial experiments suggest that 
the click beetle luciferases may be more resistant 
to denaturation by charged detergents, or activa- 
tion by neutral detergents (Kricka and DeLuca, 
19g2). Collectively, thc»e oh^rvations suggest 
that the activity of the dick beeite luciferases may 
be less sensitive to environmental conditions. 
However « the^c condusions are tentative since 
they were macie from luciferases in the presence 
of other components of the E. coii lysate. 

A novel feature of the click beetle clones is the 
ability to distinguish between them by the colour 
of light emiiied Thiv m«y make thorn piuficularly 
useful as gene lie reporters where niu triple repor- 
ters arc desirable. Because I heir rcspeatvc 
sequences differ by only a few amitio acids, 
choructeristics of their exprc^^ion in cxojtenout 
hosts should also differ little. Ilie differences in 
(he colour of light would normally have no effect 
on the hosts, but regardless, expression of a 
luciferase reporter 2ene is generally done in the 
absence of the hsdferin substrate. Thus there it 
no luminescent activity until the actual lime of the 
luciferase assay. The spectral distribution of the 
ludferasei are rather broad which would limit the 
ability to distinguish each luciferase in a mixture if 
their respective amounts vary widely. The 
greatest distinction can be made between the 
green and orange-emitting clones, which should 
be distinguishable In a lumfnnmerer with the use 
of optical cut-off fihers. From calculations based 
on the overlap of their spectra alone, and 
assuming a coefficient of variation of 4% in the 



assay of luminescence, this method should alkw 
the detection of one of the luciferases in the 
presence of a 25-fold excess of the other. Since 
the colours of ihcsc luciferases are not easily 
altered by pH or temperature, it should be 
possible to distinguish these luciferases in vivo as 
well ai in Wlro. This type of dual reporter gene 
syttem would allow one to monitor different 
promotcn within a single host, or to follow 
different populations of cells simultaneously, 
each labelled with a different ludferase. The 
structural similarity of the luciferases increases 
confidence that differentinl effects noted in an 
experiment are properties of the system being 
observed, and are not artefacts due to individual 
peculiarities of the reporter genes themselves. 



SUfMMARY 

Firefly luciferase has been used as a tool of 
scient^c invc!tligati<>n for over two decades 
because of the high sensitivity with which its 
enzymatic activity con be asuycd. With ttic 
athrent of techniques in nucleic acid manipula* 
tions» it has found its newest area of application as 
a reponer of gcncdc activity within living cells. In 
addition to high ;icnsitivity, its assay is rapid ami 
does not require complex procedures or precau- 
tions. In comparison to the CAT ussuy. firefly 
lodfcrase has been ^hnwn to be well suited as a 
genetic reporter. Hui, whcrcus previously firefly 
luciferase wax the epitome of beetle luciferases 
because of its avaibbility, cloning techniques 
have made rcusittle the study of tiihor lucifer;ises 
of this type. Some of these luciferuKe« mwy h»v« 
additional feutures enhancing their use us repor* 
ters, or in other applications. Our recent doniiig 
of several ludferascs from a biofumincsccnl dick 
beetle suhsiantiates this posscihiliiy. These 
hidferases are unique in the ability to produce 
biolumincscence of several different colours. In 
addition, the sequence of these luciferases is 
OMttiderably different from that of the firefly 
ludferase. suggesting that other chemical prop- 
erties of these enzymes will be different. One area 
where such differences are apparent is in the 
reapoftse of the bioluminesccncc spectra to 
changes in pH. We arc currently investigating 
other properties of these new luciferases to better 
undcrsund their general nature and to determine 
their suitability in applications. 
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Luc Genes: Introduction cjf Colour into 
Bioluminescence Assays 



Department of Chemistry, M-001, University of California. San Diego, La Jolla, CA 92093. 



Luminescence assays are generally based on measurements of light intensity alone. 
Inclusion of colour as an additional parameter of the assay could increase the informa- 
tion content. Colour variation in luminescence is particularly prevalent among beetle 
lucif erases. To study the relationship between enzyme structure and colour, lucif erases 
from a Jamaican click beetle were examined as a model system. These lucif erases emit 
light ranging from green to orange, though their amino acid sequences differ by less 
than 5%. Through mutation of their respective cDN A clones, the amino acids responsi- 
ble for the colour variation were identified. These specific amino acids are few, and 
they act upon colour Independently with respect to the enzyme structure. Analysis of 
their effects indicates that the potential for colou^ variation among beetle lucif erases 
is greater than is evident among the click beetle luciferase. Because of the subtle 
changes of enzyme structure that effect colour, lucif erases that emit different colours 
may be useful as paired genetic reporters. They should interact equivalently w»th the 
intracellular environment of a host, but could be distinguished by colour in their assay. 
Such paired reporters could be used to observed simultaneous events, or to provide 
ir^ternal control for luminescence measurements. 

Keywords: Firefly luciferase; click beetle luciferases; reporter genes; colour variation 



INTRODUCTION 

The production of light by enzymatic catalysis 
offers unique opportunities for probing biochemi- 
cal processes. This is because of the high energy 
density of photons and their unusual presence in a 
biochemical milieu. In applications of biolumines- 
cence, the chemistries of luminous bacteria and 
beetles have been dominant. Although the mechan- 
isms of these two systems are entirely different, they 
both are amenable to manipulations: the enzymes 
are reasonably stable and easily purified, the luci- 
ferins are available by chemical synthesis, and the 
other substrates are readily obtainable in pure 
form. 

Luminescence as a biochemical guage is based 

0884-3996/90/020107 -08$05.00 
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on correlating light with a limiting component of 
the enzymatic reaction— changes in concentration 
of the limiting component cause proportionate 
variation in light emission. Initial applications of 
luminescence used cofactors as the limiting compo- 
nents. For example, firefly luciferase has been 
widely used to measure ATP. Similarly, bacterial 
luciferase coupled to an oxidoreductase has been 
used to measure NADH. With coupling to other 
enzymes, these luciferases have also been used to 
measure other biochemical molecules (McElroy 
and DeLuca, 1983). 

Recently, a new class of luminescence applica- 
tions has arisen where the enzyme is the lirniting 
component. In these applications, light emission is 
linked to events associated with gene regulation 
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and protein metabolism. This was made possible 
with the cloning of genes that code the luciferases 
(Cohn et al, 1983; de Wet et a/., 1985), These genes 
can be introduced into living cells, or reconstituted 
enzyme system, so that the synthesis of luciferase is 
contingent upon the kinetics of gene expression. 
With excess substrates, luminescence is proportion- 
al to the concentration of newly synthesized en- 
zyme. 

Because of the prevalence of research in molecu- 
lar genetics, applications of bioluminescence are 
most auspicious in this area. In eukaryotic systems 
especially, the use of firefly luciferase has been 
notable. This monomeric enzyme, evolved in a 
eukaryotic host, requires no post-translational 
modifications for its catalytic activity. Under opti- 
mal conditions, it catalyses production of yellow- 
green light with exceptional efficiency (McElroy 
and DeLuca, 1985: Seliger and McElroy, 1960). 

The general suitability of this luciferase as a 
genetic reporter has made it useful in a variety of 
experimental designs. Most commonly it has been 
used in examining the DNA structure of genetic 
regulatory elements (Economou et ai, 1989; Hud- 
son et ai, 1989; van Zonneveld et o/., 1988). Some 
studies have used this luciferase to investigate other 
proteins that influence gene transcription (Mellon 
et aU 1989; Waterman et ai, 1988). Also studied 
have been effects of mRNA structure on protein 
synthesis (Malone et ai, 1989; Baughman and 
Howell, 1988), and relative rates of iptracellular 
protein recycling (Nguyen et ai., 1989). The firefly 
luciferase has in some instances been used to delin- 
eate genetic events in multicellular organisms 
(Rodriguez et al, 1989; Ow et ai, 1986). 

Common to these luminescence applications is 
that measurements are made of light intensity 
alone. However, this is only one mode by which 
light can carry information. Another prominent 
property of light is its spectral distribution, i.e. the 
colour of light. If this property could be used in 
addition to intensity, it could add another dimen- 
sion to the information transmitted by the lucifer- 
ases. Each luciferase elicits a characteristic spectral 
distribution. Even within the distinct groups of 
beetle or bacterial luciferases there is variation of 
colour. Since the substrates within these groups do 
not differ, the colour variation must be due tp 
differences in enzyme structures, | 

Colour variation is especially prominent among 
the bettle luciferases. A spectacular example of this 
variation occurs in a tropical click beetle frorh 
Jamaica, Pyrophorus plagiophthalamus. The beetle 



has two sets of light organs, a pair on the dorsal 
surface of the prothorax, and a single organ in a 
ventral cleft of the abdomen. Generally the dorsal 
pair emits green light, and the ventral organ emits 
yellow light. Hence, this is an unusual example of 
an organism that emits two different colours of 
light. Even more unusual is that variation in colour 
occurs between individuals of the population. The 
dorsal organ varies in colour from green to yellow- 
green, and the ventral organ varies from green to 
orange (Biggley et ai, 1967). 

Because of the wide range of colours found in this 
single species, it was chosen as a model of colour 
variation among beetle luciferases. Research was 
begun to investigate the relationship between en- 
zyme structure and the colour of luminescence. 
Results of the ongoing project have revealed some 
general aspect of this relationship. Substantial 
changes in colour can result from substitutions of 
single amino acids in the primary structures of the 
enzymes. These substitutions can occur at several 
different positions, and the effect of different substi- 
tutions act independently. A quantitative analysis 
of several substitutions has indicated that the 
potential for colour variation in beetle luciferases is 
much greater than the range of colours found in 
this particular beetle species. These results foretell 
the feasibility of using colour as an additional 
parameter in luminescence assays. 



COLOUR VARIATION IN P. PLAGIOPH- 
THALAMUS 

To study the luciferases of the Jamaican click 
beetle, the cloning techniques previously employed 
to qlone the firefly luciferase were used (de Wet et 
aU 1985). The luciferases of the ventral light organ 
were chosen for initial study because of their wider 
range of colour variation. Screening a cDNA 
library made from this organ, both for antigenic 
epitopes and for luminescence activity, resulted in 
11 clones with complete coding regions. When 
expressed in E, colU these clones can produce 
sufficient bioluminescence to be easily visible. 

The clones are of four types determined by the 
colour of light elicited: green, yellow-green, yellow, 
and orange (Fig. 1). Among the seven clones that 
produce yellow light, or the three that produce 
orange light, the spectra are indistinguishable. Only 
one green and one yellow-green light-producing, 
clone were obtained. As determined by the posi- 
tions of the peak intensities, the range of colours 
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Figure 1 . Luminescence spectra of the click beetle lucifer- 
ases: GR. green: YG. yellow-green; YE. yellow: OR, orange. 
Intensity maKima are normalized for comparison 



produced by the clones is the same as was measured 
from living beetles (Biggjey et aL 1967). However, 
there are colours displayed by hvmg beetles that 
are not represented by the clones. Some of these 
colours may be the result of heterozygous beetles 
expressing a mixture of luciferases. Of course, the 
genes of some colours may not have been cloned 

The amino acid sequences of the click beetle 
luciferases were determined from the nucleic acid 
sequences of the clones. In paired comparisons 
these amino acid sequences are 96% to 99% identi- 
cal, differing by 26 to 3 amino adds respectively. 
The comparisons reveal the genealogy of these 
enzymes, which shows that they have evolved m the 
order of their colours. Thus, luciferases of similar 
colour are of more similar sequence. Also, the most 
recently evolved luciferase emits orange light; the 
green-emitting luciferase is the oldest So the se- 
quences of the luciferases emitting orange and 
yellow are more similar than those of the green and 
yellow-green. 

As noted above, the variation in colour must he 
within differences of the enzyme structures. Since 
there is no evidence of post-translational modifica- 
tion in the luciferases, the differences should be 
evident within the amino acid sequences. To deter- 
mine which (and how many) of the amino acids 
affect colour, mutants of the luciferases were made 
by modifying their cDNA clones. The mutants were 
made by two methods. Many were made simply by 
exchanging restriction fragments between the 
clones. The resulting new luciferases were named 
rearrangement hybrids. Other mutants were made 
using synthetic oligonucleotides for site-specific 
changes. 

Because the rearrangement hybrids were made 
by swapping segments of genetic code, they often 
contain multiple amino acid substitutions. Nota- 



tion to describe these substitutions is, for example, 
R223»^238 "^h^s depicts the substitution of 

arginine at position 223 and leucine at position 238 
for glutamine and valine respectively. The resulting 
changes in the colour are reported in wave numbers 
instead of wavelength since wavenumbers are pro- 
portional to energy, an additive quantity. In the 
example, the substitutions cause a colour shift of 
- 520 cm" \ from 17,760 cm" ' of the parent luci- 
ferase to 17,240 cm of the progeny. 

The results of studying several mutants show 
that colour differences among the yellow-green-, 
yellow-, and orange-emitting luciferases are due 
predominantly to three amino acid substitutions. 
The colour difference between the yellow- and 
orange-emitting luciferases is due entirely to 
S247 -> G. Approximately 90% of the colour differ- 
ence between the yellow-green- and yellow-emit- 
ting luciferases is due to two substitutions, R223 
E and L^as V; the colour shift caused by L^js 
V is about 1.3-fold greater than that of R223 
The remaining 10% of colour difference is due to 
one or more substitutions of L^j, D226> ^2821283* 
V323, V389 I,E,IV,I,I. Likewise, the amino acids 
affecting colour of the green-emitting luciferase 
have not yet been precisely determined because of 
the large number of sequence differences between 
this and the other luciferases. 

From other mutants it is evident that the substi- 
tutions that affect colour do so regardless of the 
parent luciferase. For example, the S247 G sub- 
stitution, which causes the yellow-emitting lucifer- 
ase to produce orange light, can be applied to the 
yellow-green-emitting luciferase. This causes an an- 
alogous shift of colour to yellow. Similariy, E223 
R, which partially effects the shift from yellow to 
yellow-green, produces an analogous shift towards 
green when applied to the orange-emitting lucifer- 
ase Apparently, colour differences evolved in the 
click beetle luciferases by the cumufative effects of 
individual substitutions. That is, the change from 
yellow-green to orange requires the combined 
action of R223 - E, L238 - V, and S247 - G (with 
small contributions from other substitutions). Pre- 
sumedly, this general scheme applied also to the 
green-emitting luciferase. 



INDEPENDENCE OF SUBSTITUTIONS 
AFFECTING COLOUR 

When amino acid substitutions in an enzyme act 
independently, the action of one substitution 
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should not affect the action of the other. As a 
corollary, the individual effects of the substitutions 
should be additive. In the opposite situation, ex- 
treme cooperativity, the action of one substitution 
is fully dependent on the other. Among the click 
beetle luciferases, the cumulative action of the 
substitutions that affect colour rules out extreme 
cooperativity. Superficially, however, the substitu- 
tions do not appear to act entirely independently. 

For example, the rearrangement hybrid with 
substitutions E223, V238 R,L causes the colour of 
the yellow-emitting luciferase to shift 490 cm 
However, the combined effects of E223 R and 
V238 -* L applied to the yellow-emitting luciferase 
is a shift of 420 cm" ^ Thus, the individual affects 
appear to be 14% less than their combined affect. 
As another example, the substitutions $24.1-^0 
applied to the yellow-emitting luciferase causes a 
shift of - 430cm"\ However, when the same 
substitution is applied to the yellow-green-emitting 
luciferase, the resulting shift is - 580 cm 35% 
greater. 

Close examination of the shifts caused by each 
type of substitution has revealed a consistent re- 
Itionship with regard to the colour of the parent 
luciferase. Specifically, substitutions have a greater 
affect on colour when applied to luciferases of 
greater wavenumber. This effect is shown quantita- 
tively in a plot of shift magnitude vs shift position 
(Fig. 2). The magnitude is simply the absolute value 
of the difference in positions of spectral maxima 
between the parent and progeny luciferases, i.e. the 
shift without regard to sign. The positions of the 
shift was taken as the average of the position of 
spectral maxima for the parent and progeny lucifer- 
ase. This was chosen instead of the position for the 
parent luciferase since, in theoretical consider- 
ations, the distinction between parent and progeny 
is arbitrary. Thus, by using the average as a mea- 
sure of position, the choice of parent or progeny is 
mute. 

It is apparent from the plot that the different 
types of substitutions define a converging set of 
trends. The substitutions (or groups of substitu- 
tions), whose resulting shift are shown as filled 
symbols in Fig. 2, respresent independent sets. The 
grouped substitutions, whose resulting shifts are 
shown as open symbols, include within themselves 
substitutions of the independent sets. Although 
these shifts do not result from independent substi- 
tutions, they are derived from independent mea- 
surements of rearrangement hybrids. Thus, they 
provide additional evidence for the trends. 




position of shift (crrr^xlO-^) 



Figure 2. Relationsl^ip of shift magnitude to the position of 
the shift. Key to amino acid substitutions is shown on the left. 
Filled symbols show shifts caused by independent substitu- 
tions; open symbols show shifts resulting from combined 
sets of substitutions. Bars above and below each data point 
indicate one standard deviation of experimental error. See 
text for criteria of the axes and description of interpolated 
lines 



Using linear least-squares analysis, a set of lines 
was determined for thp shifts of each set of substitu- 
tions. Most of these lines cross the abscissa near 
15,500 cm *^ The lines drawn in Fig. 2 are an 
interpretation of the data. These lines converge to a 
single point on the abscissa. The inference is that 
there exists a unique minimum for the position of 
the luminescence spectrum. Effects of the substitu- 
tions are not able to exceed that minimum. That is, 
there is a limit to how red the luminescence can be 
and no alterations to the luciferase structure can 
result in light that is more red. 

This seems reasonable with regard to the struc- 
ture of the light-emitting molecule. Beetle lucifer- 
ases catalyse light production by combining ATP 
and'luciferin to form luciferyl-AMP, which is then 
oxidized to oxyluciferin. The oxyluciferin is formed 
in an eletronically excited state, and a photon is 
generated upon its transition to the ground state. 
The colour of luminescence is determined by the 
energy difference between the ground and excited 
states. This difference is influenced by interactions 
of the oxyluciferin with the enzyme structure. 
Therefore, colour can be affected by substitutions 
of specific amino acids. However, without changes 
to the bonding structure of oxyluciferin, there 
should exist a minimum possible energy difference 
betwien the ground and excited states. This mini- 
mum-would impose a minimum of the energy of the 
emitted photon, i.e. the colour of light 

The lines in Fig. 2 were drawn to converge at 
15,400 cm" ^ This value is the position of maximal 
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intensity for the luminescence generated by oxida- 
tion of luciferyl-AMP in aqueous buffer without 
enzyme (White et al, 1971). It is the lowest wave- 
number value measured for the luminescei&e of 
oxyluciferin under any conditions, with or without 
enzyme. It is known that the colour of luminescence 
elicited from non-enzymatic oxidation is dependent 
on the polarity of the solvent. Under less polar 
conditions, such as in DMSO, the spectrum of 
luminescence shifts to greater wavenumbers (White 
et a/.. 1971). Water is a highly polar medium and 
may provide conditions for luminescence of mini- 
mum energy. 

Drawing the trends of spectral shifts as lines 
converging at 15,400 cm *^ is in good agreement 
with the data, each of the data points is near its 
respective line within one standard deviation of 
experimental error. Thus, the empirical data is in 
accord with the hypothesis of an energy, minimum 
for the luminescence. Since this minimum was 
measured from a non-enzymatic reaction, it en- 
dorses the belief that this minimum is deterrnined 
entirely by physical properties of oxyluciferin. 
Hence, where the lines converge in Fig. 2 should be 
independent of structural features particular to the 
click beetle luciferases. Moreover, colour changes 
caused by amino acid substitutions in any beetle 
luciferase should exhibit trends that converge to 
this same minimum value. 

The apparent lack of independence noted above 
between colour shifts is evident in the slopes of the 
lines in Fig. 2. If the shifts had displayed the sense of 
independence describe at the beginning of this 
section, then the slopes would have to be zero. That 
is, regardless of the colour of the parent luciferase, 
the substitutions would cause the same magnitudes 
of shift. Yet, slopes of zero would imply no mini- 
mum to the energy of luminescence. Thus, the 
interdependence between the effects of substitu- 
tions imposed by the slopes is due to physical 
limitations of the substrate. 

Within this constraint, however, the effects of the 
substitutions behave completely additively. For 
example, calculated from the lines of Fig. 2, E223 
R applied to the yellow-emitting luciferase would 
result in a shift of 204 cm" ^ Applying V^jg L to 
the resulting hypothetical luciferase would result in 
an additional 288 cm" ^ The sum of these shifts, 
492 cm " \ is only 1% less than the expected value of 
V2 38-^R>L applied to the yellow-emitting 
luciferase, 498c,"V Similarly, a shift caused by 
S247, D266. V282l283>R35i -^G3,IV,G is cqual to 
the sum of its component substitutions. Because the 



trends of Fig. 2 are described by lines, additive 
relationships demonstrated for one luciferase are 
the same when applied to other luciferases. Also, it 
follows that the order in which the substitutions are 
considered is unimportant. 

Therefore, respective to the slopes of the lines, the 
effects of the substitutions act fully independently. 
In equivalent terms, the substitutions are indepen- 
dent with regard to their action on the substrate; 
the apparent interdependence of colour shifts can 
be attributed to properties of oxyluciferin. There is 
no evidence of dependent relationships mediated 
by the structures of the enzymes. That is, there are 
unlikely to be any interactions within the enzyme 
structures between the amino acids at the positions 
of the substitutions. ^ 



POTENTIAL FOR COLOUR VARIATION 
AMONG BEETLE LUCIFERASES 

The colours of luminescence emitted by the Jamai- 
can click beetle define nearly the full range of 
colours found in all luminous beetles (Lall et aU 
1980). Yet, the trends depicted in Fig. 2 suggest a 
potential for colour variation in beetle luciferases 
that is much greater. The range spanned by the 
click beetle luciferases is 1400 cm"" ^ If the lower 
end of this range were extended to its theoretical 
limit, the full range would double to 2800 cm K 
Furthermore, there is no indication in Fig. 2 of an 
upper limit to the possible range. Certainly an 
upper limit exists owing to conservation of energy 
in the luminescent reaction. However, it is un- 
known what further considerations could impose a 
more strcit upper limit. 

The ability of beetle luciferases to support redder 
colours of luminescence is evident in the lumines- 
cence of the firefly luciferase (P. pyralisy Though 
this enzyme normally emits yellow-green light, 
under several conditions it emits red light of 
16,160 cm"^ (McElroy and DeLuca, 1985). Some 
of these conditions are pH below 7, temperature 
above 30X, and the presence of heavy metals such 
as Hg^^. Chemical modification to the enzyme can 
also result in red luminescence (Alter and DeLuca, 
1986). This red colour extends the range of enzy- 
matic luminescence by 50% over that of the click 
beetle luciferases alone. 

Nature has also provided one known example of 
red beetle luminescence in a rare species called 
Phrixothrix, This larviform beetle of South Amer- 
ica has two rows of light organs that emit yellow- 
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green light, and a pair near the head that emit red 
light. Although the spectra of this red has not been 
measured, it is of much lower wavenumber than the 
orange of the click beetle. Thus, red luminescence is 
also possible through the mechanism of natural 
evolution. Since post-translational modifications 
that affect colour are not found in either the firefly 
or click beetle luciferases, the evolution of Phrixoth- 
rix luciferase is probably also mediated by modfica- 
tions to the amino acid sequence. 

So why do virtually all species of luminous 
beetles emit light in the limited range of green to 
yellow? The above discussion indicates that the 
enzyme is capable of supporting a much larger 
range. The reason may be in the 'motive' of beetle 
luminescence. The system has evolved to maximize 
communication between beetles, i.e. to maximize 
visibility. An essential aspect of this is the way in 
which colour interacts with the environment. For 
example, green is the colour of maximum reflec- 
tivity for foliage. Also, measurements of ambient 
light at dusk in a foliated area reveal a minimum 
near yellow (Seliger et ai, 1982b). The colours of 
beetle luminescence may be partially dictated by 
these environmental parameters, depending on the 
behavioural characteristics of the species. Evidence 
for this has been documented for firefly lumines- 
cence (Seliger et al, 1982a,b). 

However, the needs of beetle communication do 
not necessarily equal the needs of genetic research. 
In applications to utilize luciferase ,oi different 
colours, a wide range would be more useful. 
Further study of colour variation in beetle lucifer- 
ases should allow development of synthetically 
modified enzymes that elicit colours not found in 
nature: This could accord access to the full colour 
potential of this luciferase system. The lesson of the 
click beetle luciferases is that such modifications 
may encompass only substitutions of indepen- 
dently acting amino acids. Moreover, this natural 
example shows that there may be many candidates 
for such substitutions. 

This follows from the relatively recent evolution- 
ary history that brought about the different colours 
of the click beetle. Evolution is unable plan in its 
course; it can only operate by selection of randomly 
provided mutations. Yet, in the 26 amino acids that 
di^tinquish the four click beetle luciferases, more 
tlin four affect colour. Thus, by trial-and-error 
evolution, acceptable candidates for colour varia- 
tion were rapidly found. It can be inferred that little 
of the mutagenic potential of these enzymes was 
tested by this process from comparison with the 



firefly luciferase. The luciferases from these two 
beetle species differ in amino acid sequence by 51%. 
Thus, amino acid substitutions are potentially ac- 
ceptable at 275 positions. For each position, several 
different amino acids may suffice. In brief, the beetle 
luciferases appear to hold much potential for modi- 
fication and much potential for variation in colour 
of luminescence. 



APPLICATION OF COLOUR IN 
LUMINESCENCE ASSAYS 

In luminescence assays, light intensity is the signal 
that conveys information of biochemical events. 
With the use of two luciferases that emit light of 
different colours, it should be possible to provide 
two signals simultaneously. Beede luciferases offer 
the potential of providing such a two-coloured 
system. This probably would be most useful in 
applications associated with molecular genetics. 
This is because applications utilizing two colours 
would generally depend on the enzymes as the 
limiting components of the assay. The concentra- 
tion of the enzymes must be the source of the 
signals since the distinction of colour lies in the 
enzyme structures. Assays based on a substrate as 
the limiting component, ATP for example, would 
not benefit because the enzymes of both colours 
would generate signals dependent on the same 
condition. Thus, additional information could not 
be gained with the second colour. 

The obvious use of two colours would be for 
simultaneous detection of two different events. This 
would be especially useful when the events are 
coordinated. An example could be the transcrip- 
tional activities of promoters regulated by a com- 
mon mechanism. Moreover, the promoters need 
not be of a single host, such as with regulation 
mediating symbiotic or parasitic relationships. The 
luciferases also need not be used for quantitative 
measurements, but merely as markers for two 
populations. For instance, populations of a colony- 
forming organism could be identified visually by 
their colour of luminescence. The genes coding the 
luciferases may also offer a method of detecting 
genetic recombination events, depending on the 
positions of the colour-determining nucleic acid 
substitutioils. 

Another general use of two colours would be for 
provisions of an internal control in luminescence 
measurements. Precision in genetic measurements 
can be irnpbrtant, especially in eukaryotic hosts 
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where differences of two- to three-fold are signifi- 
cant. Internal controls are often needed to compen- 
sate for uncontrollable variables^ A common 
example is in experiments where DNA is intro- 
duced into cells for measurements of transient gene 
expression. To compensate for variation in the 
efficiency of transfection, a second genetic reporter 
is sometimes included (Dirks et ai, 1989; Day and 
Maurer, 1989). 

After a gene is introduced into a cell, there arc 
other potential variables of gene expression, such as 
rates of translation and protein stability. These 
variables that concern the behaviour of a reporter 
within a cell are more difficult to compensate for by 
using a second reporter. The problem is that differ- 
ent reporters can behave quite differently in a 
common host. For instance, comparisons of lucifer- 
ase to another commonly used reporter, chloram- 
phenicol acetyl transferase (CAT), reveal 
substantially different kinetics of expression (Max- 
well and Maxwell, 1988). As expected, the struc- 
tures of these dissimilar proteins interact differently 
with the complex metabolism of the host. 

The ideal solution would be to use reporters 
whose structures are identical, yet could be distin- 
guished in their assay. Herin could be the major 
advantage of beetle luciferases. Since only a few 
amino acid substitutions are needed to alter colour, 
the overall structures of the reporters could be 
virtually identical. Hence, there would be little to 
allow discrimination of their interactions with an 
experimental host. That is, the host could not 
differentiate one reporter from the other. Upon 
assay, however, distinction between the reporters 
would be made by their colours of luminescence. 
The similarity between the luciferases would be 
especially prominent if the distinguishing amino 
acids were internal to the protein structures. This 
may be the case since the amino acids that affect 
colour are likely to be close to the luciferin binding 
site. 

Normally, experimental controls are imple- 
mented in genetic experiments through compar- 
isons of a test population with a control 
population. Inclusion of an internal coritrol would 
be most useful when inter-experimental variation is 
large, or replica experiments are difficult to obtain. 
For instance, rephca populations could be difficult 
to achieve when the experimental host is not der- 
ived from a stable clonal source. In this circum- 
stance, comparisons between test and control 
populations would be difficult. However, an inter- 
nal control would allow for simultaneous compar- 



isons of a test and control within an experiment. 
The closely matched structures of beetle luciferases 
could provide a means for internal control is ej^per- 
iments that utilize reporters. The light intensKy of 
one colour would serve as the test signal, and the 
other the control signal. An example in which this 
may be useful is in measurements of transgenic 
organisms. Even though the hosts in these experi- 
ments may come from a clonal stock, acivity of 
exogenous genes inserted into their chromosomes 
can be strongly position-dependent 

An especially promising attribute of beede luci- 
ferases as genetic reporters is the ability to detect 
their activity from within Hving cells. Two sub- 
strates of the luminescent reaction, ATP apd O^, 
are available in the cellular interior. The third 
substrate, luciferin, can gain access to the interior 
by diffusion through the membrane. Thus, in cells 
expressing luciferase, a luminescent signal can be 
generated for external detection. Because photons 
are created at the instant of catalysis and do not 
accumulate, the signal is a * real-time* indicator of 
the intracellular luciferase concentration. However, 
quantitative measurements in living cells can be 
obscured by several variables. For instance, the 
enzyme may not be the limiting component of the 
assay in the intracellular environment since the 
availability of the other substrates could be limit- 
ing. Thus, changes in light intensity could reflect 
variations in any of the components. 

Other factors, such as those described previously, 
also may conceivably affect intracellular lumines- 
cence. Use of two beetle luciferases to provide 
internal control could compensate for these in- 
fluences. The test signal of one luciferase would be 
coupled to the gene of interest, and the control 
signal coupled to a reference gene. A suitable 
reference gene could be one of constitutive activity, 
a so-called 'housekeeping' gene. Measurements 
would be made not of absolute light intensity of the 
reporter signal, but by the relative intensity of the 
test signal compared with the control signal. Con- 
ditions that would affect both signals, such as 
changes in concentration of accessible internal sub- 
strates, could therefore be compensated for. By this 
method of measurement, indications of genetic 
regulation could be made with direct reference to 
the baseline genetic activity of the host. 

In detail and variation, there doubtless are many 
ways in which beetle luciferases of different colours 
could be useful. The general suitability of a single 
luciferase as a genetic reporter is already amply 
demonstrated. Since introduction of this applica- 
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tion, each year has borne increased numbers of 
citations of its use in the scientific literature. The 
additional potential offered by luciferases that emit 
different colours lies in the subtlety of their struc- 
tural differences. Such a matched pair of reporters, 
with the sensitivity and versatility of the luciferases, 
has not been manifested by other systems. Thus, the 
potential capabilities of beetle luciferases may not 
only improve current methods of assay, but in 
addition may endorse new methods. The possibili- 
ties presented here are based on our recent knowl- 
edge of the bettle luciferases, but further research 
will be needed to test the limits of these possibilities. 
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Compkmentaiy DNA Coding CUdc Beetle Ludfoases 
Can Elicit Bioluminescence of Different Colors 



Keith V. Wood * Y. Amy Lam, Howard H. Seliger, 
William D. McElroy 

Eleven complementary DNA (cDNA) clones were generated from messenger RNA 
isolated from abdominal light organs of the biohiminescent dick beetle, Pyrophorus 
plaglophthalamtu. When expressed in Escherichia coliy these clones can elicit biolumines- 
cence that is readily visible. The clones code for hiciferases of four types, distinguished 
by the colors of bioluminescence they catalyze: green (546 nanometers), yellow-green 
(560 nanometers), ycUow (578 nanometers), and orange (593 nanometers). The 
amino add sequences of the different hidferases are 95 to 99 percent identical with 
each other, but are only 48 percent identical with the sequence of firefly ludferase 
{PhoHnus pfratis). Because of the different colors, these dones may be useful in 
experiments in which multiple reporter genes are needed. 



NEARLY ALL OUR KNOWLEDGE OP 
bcctk ludfcrascs is derived from 
smdics of a single species, the 
North American firefly Pkotinus pyralis. 
Comparative studies with other beede lucif- 
erases have been hampered because of limit- 
ed availability of thc other species. Evolu- 
donariiy, bccdc hiciferases arc imrelated to 
any of the other groups of ludfcrascs that 
have been studied biochemically (/). Litdc is 
known about thc ludfcrascs from other bee- 
tles except that they all catalyze thc produc- 
tion of various colors of light through the 
oxidative decarboxylation of beetle luciferin 
(2). Since the substrates of the luminescent 
reaction arc thc same in all these bccdcs, thc 
different colors must be due to differences in 
the structure of the enzymes (J). 

Recently we doned a cDNA that codes 
for the lucifcrase of P. pyralis^ and have 
shown that it can be used to express btolu- 
nuncsccnce in Escherichia coli. We report here 
the doning of cDNAs that code for several 
new hiciferases from a biotumincsccnt click 
beetle, Pyrophorus phgiophthahmus. This bee- 
de is unusiial because it can emit biolumi- 
nescence of a wide range of colors from a 
single spedes. Thc expression products in E. 
coli of the cDNAs derived from this beetle 
are able to produce green, ycllow-grccn, 
yellow, and orange light. As determined 
from the nucleotide sequences of thc dones, 
the amino add sequences of these click 
beetle ludfcrascs are highly conserved 
among one another, but diverge from thc 
sequence of the firefly ludferase. Taxonomy 
indicates that the dick beetle ludfcrascs 
probably are the most cvoluttonarity distant 
of thc beetle luciferases from thc firefly 
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ludferase {4). This distance is reflcacd by 
dificrcnccs in their chemical properties. 

Pyrophorus ptagiophthalamus is a large beede 
with two sets of light organs. One set, on 
thc dorsal surface of the head, emits light 
that is greenish but the exact color varies 
between individual becdcs of thc species, 
ranging from green (548 nm) to yellow- 
green (565 tun). The other set, at thc anteri- 
or of thc abdomen, generally emits light of a 
longer wavelength than thc head organs but 
also varies between individuals ranging from 
green (547 nm) to orange (594 nm) (5). We 
converted mRNA isolated from thc abdomi- 
nal light organ of 60 becdcs to cDNA and 
inserted this into a spedalized lambda clon- 
ing vector. Lambda ZAP (rf). The ability to 
convert this modified lambda vector into a 
bactenal expression plasmid (Blucscript) 
through an in vivo process allowed us to 
screen thc cDNA library by two methods 
(7). In thc phage form of the library, we 
screened widi antibody to firefly lucifcrase 
that cross-reacts with thc click beede lucifer- 
ases (8) and isolated four full-length dones 
that expressed bioluminescence in E. coli. A 
portion of the cDNA library was converted 
into the plasmid form, and we screened this 
for bioluminescence in thc baacrial colo- 
nics. Bioluminescence can be initiated in 
colonies of E. coti expressing ludferase by 
adding ludfcrin to the media (9). Seven 
more cDNA dones were isolated by this 
method. It was determined visually that of 
the devcn dones, one produced green light, 
one produced yellow-green light, six pro- 
duced yeUoW light, and three produced or- 
ange light. 

Immunoblo^ analysis confirmed the pro- 
duction of full-lcngdi dick bccdc lucifcrase 
in E, coli. Despite sohk of these dones being 
detcaed with antibody to firefly lucifcrase 
during the library screening of plaques, we 
could not dctea thc gene products in blots 
made directly with E. coli lysatcs. The 



expression of bioluminescence was im- 
proved by transferring the cDNA dones 
into a plasmid vector incorporating the tac 
promoter (iO). A lysate fix)m E. coli express- 
ing the green-emitting lucifrrasc from this 
vector was partially purified. After gel elec- 
trophoresis and blotting, a single andgenic 
band was revealed that comigrated with the 
native dick beede lucifcrase. Subsequently 
one cDNA done from each of the four 
color-emitting groups was sequcfKcd. An 
open reading frame was revealed in each that 
could potcnrially code a protein, the se- 
quence of which correlated with thc entire 
length of the sequence for firefly ludfierase. 
Thus tiic complete protein coding regions of 
the dick beede ludfcrascs were apparently 
contained within their cDNA dones. 

Expression of biohimincsccncc fix)m the 
tac vector yielded sufficient intensity, upon 
addition of ludfcrin to tiie media, to allow 
measurement of the spectral distribution 
from intaa cells (Fig. 1). TWs confirmed the 
visual assigmncnt of the 11 cDNA dones 
into four color groups. For each of the four 
colors, die bioluminescence spectrum is a 
single peak qualitatively similar to thc spec- 
tra of the native click beede luciferases (3). 
Thc range of colors from the dones is 
rcpresenutive of the full range measured 
from thc abdominal light organs of living 
beetles. However, there arc colors emitted 
by the becdcs, within the extremes of this 
range, that do not correspond to any of thc 
dones (5). Thus other lucifcrase genes may 
not have been isolated. Spectra of thc lucif- 
erases were also measured from partially 
purified preparations obtained from lysates 
of the H. coli expressing the cDNA dones. 



Fig. 1. Bioluminescence from colonics of £. coli 
expressing thc click bccdc hidferases. Four streaks 
of^ £. coliy cadi consisting of hundreds of colonics, 
show dK four cobrs of bioluminescence emitted 
by the difierent luciferases. Thc colonics were 
grown on nitroccihik>se filters layered on top of 
nutrient agar. To iniaate thc btolumincsccnt reac- 
tion, the Stcrs were removed from the agar and 
soakied with 1 mM hidfcrin in 100 mM sodium 
citrate, pH S.O. The photograph was produced 
from a 2-s contaa exposure of the coktnies onto 
Ektachrome 64. 



700 



SCIENCE, VOL. 244 



Rg. 2. Spectra of the click beetle luciferases, from 
intact H. coli immersed in hiciferin, show four 
overlapping peaks of nearly even spacing. The 
maximum intensity of each spcctnmi has been 
normalized. The spectra were measured on a 
Fastie-Ebcrt-type grating spectrometer (2) and 
were corrcacd for the spectral sensitivity of the 
photomultiplicr tube and for timc-dcpcndcnt 
variation in the intensity of the luminescent reac- 
tions. Wavelength positions were calibrated 
against the spcca^ lines of a mercury vapor lamp. 
Virtually identical spectra were obtained from 




Green (546 nm) 

{560 nm) 
(578 run) 
Orange (593 nm) 



Wavelensth{nm) 



tysates of these H. coli between pH 6 to 8. The lysatcs were prepared from cells grown to middle or late 
log-phase growth at 37*C, and then for 2 hours at SCC with isopropylthiogalactosidc (IPTG) added to 
1 mM. The cells were washed and rcsuspcnded in approximately 1/150 volume of the culture with 100 
mM potassium phosphate, pH 7.0, 2 mAf EDTA, and 35% of saturation (NH4)2S04. After lysis by 
sonication and removal of the debris by ccntrifugation, (NH4)2S04 was added to 53% of saturation and 
the precipitate was dissolved in 1/15 volume of the lysatc with 100 mM potassium phosphate, oH 7.0, 2 
mMEEaA, and 50% glycerol. The spectra were measured from 10 jil of this solution diluted 100-fold 
with 50 mM 2-(N-morpholino)ethanesulfonic acid, 50 mM 2-(N-morpholino)propanesulfonic acid 
(MOPS), 50 mM tricinc, 5 mM MgS04, 1 mM EDTA, 0. 1 mM luciferin, 1 .5 mM ATP. 1 mM NaF, 
0.2 mg of bovine scrum albumin per milliliter, and 10% glycerol. The spectra were measured at pH 6.0, 
7.0, and 8.0. 
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Fig. 3. Alignment of the amino add sequences of the click beetle and the firefly luciferascs is shown to 
emphasize sequence differences. The sequence information is derived from the open reading frames of 
the corresponding cDNA clones. The identity of each luciferase sequence is indicated at the right of 
each line by a two letter code: FF, firefly; GR, green-emitting click beetle; YG, yeUow-green-cmitting 
click beedc; YE, yellow-emitting click beede; and OR, orange-emitting click beetle. Only the sequence 
for the green-emitting click beetle ludfcrase is shown in entirety. Gaps in the alignment of this sequence 
arc indicated by hyphens. Other luciferase sequences have letter designations only at sites where they 
differ from the ^reen-emitting luciferase; where the sequences are the same there is a period. Numbers 
on the right indicate the position of the amino acid at the end of each line. Abbreviations for the amino 
acid residues are A, Ala; C, Cys; D, Asp; E, Glu; F, Phe; G, Gly; H, His; 1, He; K, Lys; L, Leu; M, Met; 
N, Asn; P, Pro; Q, Gin; R, Arg; S, Ser; T, Thr; V, Val; W, Trp; and Y, Tyr. 



Between pH 6 to 7, the spectra of these 
prepararions were indistinguishable from 
rfiosc of intact cells. At pH 8 there was a 
slight broadening of the spectra for the 
green- and yellow-emitting luciferascs. The 
firefly luciferase shows a large spectral shift 
between pH 6 to 8. At pH 8 its spectral 
maximum is at 560 nm, which shifts to 615 
nm (red) at pH 6 with a decrease in the 
quantum yield (71). 

The sequences of the different click beetle 
luciferases are highly similar (Fig. 2). The 
open reading frame of each of the sequenced 
cDNA clones potcnrially codes a 543-resi- 
due polypeptide. Comparisons of the de- 
rived amino acid sequences show a 95 to 
99% identity between the different color- 
emitting luciferases. Thus the number of 
amino acids that are responsible for the 
differences in the color is small. Because 
variation in color results direaly from differ- 
ences in the primary structures of the lucifer- 
ases, specialized posttranslational modifica- 
tions or unusual microenvironmental effcrts 
are not necessary to account for the color 
variation in the living beetles. 

Comparison of the sequences of click 
beetle luciferascs witli that of firefly lucifer- 
ase shows a low similarity. Alignment of 
their deduced amino acid sequences reveals 
that the various click beetle and the firefly 
ludferases arc 48% identical (Fig. 3). Six 
gaps in the alignment of one to two amino 
acids in length accoimt for most of a scven- 
amino acid difference in the lengths of the 
open reading frames between the firefly and 
click beede luciferases. No regions in the 
alignment show especially high sequence 
similarity, thus giving little indication that 
particular regions have been conserved be- 
cause of catalytic or structural constraints on 
the enzymes. An exception to this is in the 
last three amino acids which, for the firefly 
ludfcrase, have been shown to be necessary 
for translocation into peroxisomes (12). 
Given the close functional similarity of these 
enzymes, it is almost certain that the click 
beetle luciferascs arc also located in peroxi- 
somes. 

Firefly luciferase has historically been 
used as a biolumincsccnt reporter of chemi- 
cal events associated with adenosine triphos- 
phate (ATP) metabolism (13). With tfie 
cloning of its cDNA, this luciferase has also 
recently found application as an effective 
reporter of genetic events (14, IS). Its prin- 
cipal advantages arc that (i) the initial poly- 
peptide derived from the mRNA requires 
no posttranslational modifications for enzy- 
matic activity; (ii) the luminescent reaction 
can be measured with high sensitivity; (iii) 
the assay of the gene product is rapid and 
does not use substrates requiring special 
precautions (such as radioaaive isotopes or 



chemically unstable compounds); and (iv) 
gene expression may be detected without 
disruption of living tissue. Compared with 
the conventionally used assay of chloram- 
phenicol acetyltransferase (CAT) for gene 
activity, firefly luciferase is assayed in min- 
utes as opposed to hours, and is 100 to 1000 
rimes nK>rc sensitive (15). 

The cDNAs coding for the click beetle 
luciferases also have these features, and, as 
they can be distinguished by color, may be 



useful in situations where multiple reporters 
are desirable. Expression in exogenous hosts 
should differ little between these luciferascs 
because of their sequence similarity. Also, 
since the colors do not shift near physiologi- 
cal pH, the different luciferascs can be dis- 
tinguished in vivo as well as in vitro. Thus 
the click beetle luciferases may provide a 
dual reporter system that can allow two 
different promoters to be monitored within 
a single host, or for different populations of 
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cells to be observed simultaneously. The 
ability to distinguish each of the luciferascs 
in a mixture, however, is limited by the 
widdi of their emissions spectra. From cal- 
culations based solely on the overlap of the 
spectra of the green- and orange -emitting 
luciferases, one lucifcrase in a mixture 
should be detectable in the presence of a 25- 
fold excess of the other. 
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CHAPMAN et al. (t) HAVE RECENTLY 
described the tertiary structure of 
plant RuBisCo, the key enzyme (2) 
in the Calvin cycle of carbon dioxide fixation 
in photosynthesis. Their model is based on 
an electron density map to 2.6 A of the LgSg 
molecule from tobacco. We have deter- 
mined the structure of LgSg RuBisCo from 
spinach to 2.8 A resolution and find very 
significant diflfcrenccs in the structure of the 
S subunit compared with the reported to- 
bacco structure. Since there is 75% identity 
between the amino acid sequences of these 
two polypeptide chains, they arc expected to 
have similar tertiary structures. 

Crystals of spinach RuBisCo that diffract 
to 1.7 A resolution were grown from solu- 
tions of the activated form of the enzyme 
with a bound transition-state analogue (3). 
These crystals contain one- half the LsSa 
molecule in the asymmetric unit. There is a 
local noncrystallographic fourfold axis 
through the molecule, which has approxi- 
mate 422 symmetry. X-ray data were collect- 
ed on the synchrotron radiation source in 
Darcsbury, United Kingdom, for the native 
enzyme and three heavy- atom derivatives. 
An initial electron density map was calculat- 
ed with the use of isomorphous phase an- 
gles. Tliesc were refined by real-space aver- 
aging (4) around the local fourfold axis. 
Data coUcaion procedures and phasing sta- 
tistics have been briefly described (5). 

The final elcrtron density map was of very 
good quality, as would be expected by four- 
fold averaging of an electron density map 
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based on three heavy-atom derivatives. Al- 
most all of the side chains could easily be 
identified from the known sequences of the 
spinach S and L chains {6, 7), which com- 
prise 123 and 475 residues, respectively. 
The sequence of the S subunit, which was 
determined by amino acid analysis {S)^ con- 
tains only one Cys residue, Cys 112. How- 
ever, two independent determinations of the 
amino acid content of the spinach small 
subunit {8) made in different laboratories 
have shown that there are three Cys residues 
per subunit. Furthermore, almost all of the 
small subunits firom higher plant RuBisCo 
for which the sequences are known contain 
three Cys residues at positions 41, 77, and 
112. We therefore conclude that in all prob- 
ability the spinach small subunit also con- 
tains Cys residues at these three positions. 
Our electron density map also strongly sup- 
ports Cys side chains at these positions; the 
side-chain electron densities arc appropriate 
for Cys (Fig. lb). 

We first built the L chain (5) using the 
known structure of L2 RuBisCo from Rho- 
dospiriWum mbrum (?). We found, in agree- 
ment with the work on the tobacco enzyme 
{1, 10), that higher plant L chains have a 
structure that is quite similar to that of the 
bacterial enzyme {9) except at Uic carboxyl 
terminal. The arrangement of the L subunits 
in the spinach enzyme into four dimers 
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Reexamination of the Three-Dimensional Structure of 
the Small Subunit of RuBisCo from Higher Plants 
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Hie structure of LgSg RuBisCo (where L is the large subunit and S is the small 
subunit) from spinach has been determined to a resolution of 2.8 Angstrom by using 
fourfold averaging of an isomorphous electron density map based on three heavy -atom 
derivatives. The structure of the S subunit is different from that previously reported for 
the tobacco S subunit in spite of 75 percent sequence identity. The elements of 
secondary structure, four antiparallcl P strands and two a helices, are the same, but the 
topology and direction of the polypeptide chain through these elements differ 
completely. One of these models is clearly wrong. The spinach model has hydrophobic 
residues in the core between the a helices and p sheet as well as conserved residues in 
the subunit interactions. The deletion of residues 49 to 62 that is present in the 
Anahaena sequence removes a loop region in the spinach model. The positions of three 
mercury atoms in the heavy-atom derivatives agree with the assignment of side chains 
in the spinach structure. 
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Abstract 



^O-^. Luciferase: Bioluminescence: Luciferin: Primaiy sm«:wre; (/-. p,„«yfca„/„); (F.^fly) 



1. Introduction 



Step one forms an enzyme-bound luciferyl adeny- 
late 



Firefly luciferase (Photinus luciferinroxygen 4- 
oxidoreductase. EC 1.13. 12.7. abbreviated LUC) pro- ^Hj + MgATP + LUC ^ LUC-LH ,-AMP + MgPP 
duces light by the oxidative decarboxylation of lu- . • 

ciferin (LH,) as shown in the following equations T '''' <""d^"^<= decarboxylation of lu- 

that represent the two-step reaction productjon of light upon decay of the 

excited form of oxyluciferin 



LUC-LHj-AMP + Oj + OH 



'Corresponding author. Fax: ^- 1 (405) 7447799- E-mail- 
rireny@biochem.okstate.edu ' -» LUC-OL -♦- COj + AMP + light + H,0 

Present address: Oeparanent of Molecular Biology. Holland xu • . 

Laboratory. American Red Cross. Rockville. MD. 20855. " " release of the OXyluciferin product, 

^ Present address: 1525 Sw 7. #8. Albany. OR 97321. OL, from the enzyme-product complex 
Present address: Depanmeni of Horticulture. Michigan suie The enzymatic reaction has a quantum Yield of 

Un.vers.ty. East Lans.ng. Ml 48824. 0.88 photon/molecule of luciferin oxidiz^i [l]. TT^e 
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onzyme is widely used in the quantitation of ATP and 
oiher biochemically imponant compounds (2-5] and 
as reporter of gene expression (see bibliographic lists 
in J. Biolumm. Chemilumin. 5. I4I-I52 (1990) and 
8.267-291(1993)). 

The gene for firefly luciferase of the North Ameri- 
can firefly. Phoiinus pyralis. was cloned and se- 
quenced by DeLuca and colleagues [6.7]. Since those 
stud.es. firefly luciferase cDNAs or genes have now 
been cloned from several other beetle specicj; these 
are listed in Table 1. This paper reports the cloning 
and sequencing of a cDNA for firefly luciferase from 
Photuns pennsyhanica; this is the first species from 
the Photunnae subfamily to have its luciferase se- 
quence determined and reported to GenBank. Pho- 
luris pennsylvanica is a twilight/nighl-active firefly 
while the comiiion, well-characterized North Ameri- 
can species. P. pyralis, is dusk-active, flashing only 
during twilight. Before mating, Photuris fireflies fe- 
males respond to courtship signals of coiispecific 
males [15]. After mating, they become 'femmes fa- 
tales by answering the courtship flashes of males of 
other species who are then treated as prey. Predaiion 
by aggressive mimicry is known only for Photuris 
116]. 

The Photuris genus has been mainly studied from 
the biological standpoint. The biochemical and struc- 
tural changes that occur during light organ develop- 
ment have been studied by Strause et al. [I7J. During 
development the larval light organ regresses and is 
replaced by the adult lantern. During pupation the 
levels of luciferase and luciferin remain constant in 
the posterior half of the pupa while there is an initial 
mcrease followed by a decrease of luciferase and 
lucifenn m the anterior half Strause and DeLuca [18] 
found a luciferase isozyme in larval Photuris pern- 
syhanica that is dtsiinci from the enzyme of the 
adult. This laboratory has identified two fii^fly lu- 
ciferases from adult P. pennsylvanica lanterns during 
Sephadex G-i50 chromatography [14j. 



CDNA from Phoiuris pennsyUanica. The 61 bp leader is shown 
m groups of 10 bp *i.h a residual. The coding sequence starts at 
bp 62 and continues through bp 1700. The 5' and 3' noncoding 
region ,s shown in groups of 10 bp. TV figure was prepared by 
edlnng a ONA Strider repon after the consensus seql;^*,^ 
denved using the AssemblyLIGN program. 
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r r j-^ r r r ----- - - 

CM C9C CU MC MC tCI CCC CM UC np .r, ... ^ 

54X/111 * » ' t. 0 V X K r 

r r* r r r r r -r- *^ «^ «^ - 

•« QOt 9« tet Mg gg» grce »t9 c« c*c **- 

r r r r r r r r r r r r r =" ^ - 

etc flat «M «ct «ee .f*' p-I — . 

m/26l Ivj-!- 
ffct get :c* ACQ 2ic «cg tet om om —I -.^ 

• 'I* 3V1 

v™ r r r r r r r* r • - r* r- ;■■ «• - --^ « «= 

(M MS ^ :u CC9 C4e m om ^ c^- 

I K r 0 -. s \ - !^ «• ~ 5c» cct til *** 

Hl/321 ' j,. ' - = * ► - 5 K 

«M ae; no 9I0 «:g ctg ftM tM CTB crc m cu uc -•• »rr 

iaai/j«; ' % j;. « Q c V c 

r- r r r r- r r -r^ r <^=« - 

IWl/ML • Vl ■ ' ^ ^ ^ ' i 

*ct 9St MA 9-:^ cla :*c act «c ^ e 

r r r r r r r r r ^ r - 

1301/402 - - I ^ K S y 

"«i/m . « A . .MB □ 3 w : ft s r. 
1291 /4 J 4 

r r r r r r r r= r r r* i" r r r r 2- « =" - 
r r r r r r* r r r ~ r r* ~ - =« 
?" r r «" i"* ~ r r i" •« — -= =«• «•= »- 

.« « « J.. c= « „ j„ .„ ^ 

tet MB ctj a* 
M t • K L • 

WCtMgett cuegtiue -^ituce^ tctacacne AcctM^t^e . 
.^ucv t^cMe^tt utcci-^ M-.=c»wgt =c.t-.*AA^ ^srotgcM e**,^ 
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eage concerning ammo acids that mieht funrfmn 

dcpoMccd in the GenBank as enoy U3T2S). 
2, Materials and methods 
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2. 1. Bacterial strains and media 

2«)30I) used for AZAP Uni-Zap™ fibnay if 
^268) was used for amplification of AZAP UnT 

aiiu preparation. £ ce?// strain SOrp™ /c. * 

2.2. F/^^yy^. collection 

according to soeci^ ,„!f ? 'aboratoiy. sorted 
"id nitroHen ^Tf' '"''"'*««='y fro«n in liq- 

m/&V^ isolation and library preparation 
'"he lanterns of the frozen p / • 

gen with a monar and pestle The lantp™ dm* 
isolated by usine a St^f, 'ftem RNA was 

y using a Stratagene RNa Isolation Kit 



Ki. (Car. * 200349 ta islj A Tf^ 

2.-^. Expression 

Expression was achieved hv foii^. • 
similar to that of Devine « af m Thf ' 'r'"^' 
supplied With Str..age„e.rEl«i .I'^.^^r^^^^ 

^rT^^^ci^pt^^^^^ 

iifted filters ^crc Z:SZ .nVclX 
The filters with colonies we,, switched to 22? . H 
incubated for another 7 h f«, "^"f" '° ^2 C and 

ovenugh, AfTe7f5I,« w T"^ '° ^""J^ 



2.5. Sequencing 

in^'^RAT/"'''* ^'''Se'nid was used for sequenc- 
quencmg was started usmg T3 and T7 primers. 



for alignment of I Si '^'"^'''^^^^ « otr SrjfeL'"'"'*"'^'' ^''""^ - a" ^6 

a^d Clusul prognurs. ^"-^n-lc checked by co^X'^^Xn'^ ' "« 

K«ison lo an alignmcni made using the MACAW. 
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Fig. 2 (continued). 
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Subsequent sequencinff was continued in boih direc- 
tions using primers designed using the Oligo* 4.06 
program (National Biosciences, Inc.) based on the 
determined sequence. The primers were synthesized 
by the Sarkeys Biotechnology Laboratory. 



(which is not found in the Photinus sequence). No 
Kpnl restriction site is present in the Photitius se- 
quence. Devine et al. [II] found clones of Luciola 
mingrelica cDNA thai had either one or two Kpnl 
sites. 



2.6. Measurement of luciferase gene size 

The pJasmid containing the luciferase gene was 
digested with EcoR I and Xhol at 37X overnight. A 
separate digestion was done using BamHl and Kpnl 
under the same conditions. The digested samples 
were separated by 1% agarose gel electrophoresis 
with a I kb DNA ladder (BRL. Cat. No. 1561057) as 
a standard and uncut plasmid as a control. 

2. 7, Computer analysis of data 

The following programs and data bases (versions) 
were used; AssemblyLIGN, V 1.0.7; Beauty [19J, 
BLASTPAT, and various other search algorithms via 
the BCM Search Launcher of the Human Genome 
Center at the Baylor College of Medicine, Houston 
TX; Blocks [20.21]; DNA Striden VI. 2 [22]; Gen. 
Bank. NCBf, release 89.0 (23]; MacVector. V 4.5,2 
[24]; p//MW calculated at the ExPASy Server 
ProI>om. release 28 [25,26], ProSite [27]; PROSITE 
release 13 [27]; and SWISS-PROT, release 21.0 [28]' 



3. Results 

3.1. Isolation and sequencing ofcDNA 

The cDNA library was prepared from the lanterns 
of locally collected P. pennsylvanica fireflies, ex- 
pressed in £. CO//, and screened for light production 
after luciferin addition. Since the screening detected 
expressed bioluminescence. only functional cDNA 
sequences were identified. The insert size as deter- 
mined after froRI and Xhol digestion and 1% 
agarose gel electrophoresis was about 1.8 kb which is 
sufficient to code for the entire luciferase polypeptide 
of approx. 550 amino-acid residues (based on the 
length of other firefly luciferases). In a restriction 
enzyme-based analysis, the selected clone did not 
contain a Kpnl restriction site but did have a single 
BamHl site within the luciferase-coding sequence 
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A. Phylogenetlc tree 
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B. Biological clatalflcatlon 
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Fig. 3. Relationships among ihc amino-acid residues and the 
biological daisificarion or the sixteen sequenced fircn^ lu- 
cifcrases. (A) The rclaiionship among fircny specici based on 
firefly luciferase amino acid sequences as determined using the 
protein panimony and the Clusui algorithm in the DNA Star 
program. The length of each pair reprcacnu the distance beiwecn 
the sequence pairs and the scale beneath the tree measures the 
genetic distance between sequences. (B) BiologicaJ classification 
of the fireny species from which the luciferase has been se- 
quenced. This classification of firefly species was adapted from 
Hciring [29] and Campbell [5j. 
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Fig. 1 shows the nucleotide sequence of the P. 
pennsylvanica cDNA and the deduced amino-acid 
sequence for the largest open reading frame. From 
the sequence analysis, the cDNA is 1831 bp long 
with an open reading frame (ORF) of 1635 bp. The 
ORF encodes a protein of 545 amino acids with a 
calculated molecular weight of 60610 daltons. The 5' 
untranslated region contains 61 bp and the 3' untrans- 
lated region has 135 bp. The 3' noncoding reaion 
contains a poly(A) tail of 24 nucleotides. 

3.2. Comparison of the deduced amino-acid se- 
quences 

The amino-acid sequences deduced from Ihc 16 
CDNAS and genes sequenced for firefly luciferases 
have been aligned (see Fig. 2) to allow determination 
of conserved ammo-acid residues and suggest possi- 
ble functional portions. There are 154 residues con- 
served among all the luciferases (about 28% of the 
total residues). In the puutivc P. pennsylvanica lar- 
val luciferase. the amino acids at 276 positions are 
the same at corresponding positions of at least one 
other species. One hundred and fifteen amino-acid 
residues are unique to the putative P. pennsylvanica 
larval eiuymes. Of these, 24 residues are conserved 
m all other species. 

The amino-acid compositions of the sequenced 
firefly luciferases arc compared in Table 2. The 
calculated isoelectric points of putative P. pennsyl- 
uanica larval enzymes are the highest of all isozymes. 

3.3. Relatedness 

The amino-acid sequences of the firefly luciferases 
were analyzed by protein parsimony using the DNA 
Star program (Fig. 3A). The phylogenetic classifica- 
tion for these beetles is shown in Fig. 3B [5.29] As 
expected, the various Luciola species are closely 
related. The relationships among the various species 
as determined by luciferase amino-acid sequence ap- 
pear similar lo the relationships based on biological 
classification. The percentage of identity and similar- 
ity were calculated for the 16 firefly luciferase se- 
quences Ming the GCG program BestFlt with a gap 
weight of 3.0 (Table 3). The Ppel and Ppel lu- 
ciferases are 57% identical. 



4. Discussion 
4.1. Related sequences 

When either the cDNA sequence or the predicted 
anuno-acid sequence waa used as the input sequence 
for computer-based searches for similarity, the high- 
sconng related sequences were the luciferases from 
both the Lampyridae and the Elateridae families 
4-coumarate CoA ligase. long-chain CoA liaasc ■>- 
acylgiycerophosphoethanolamine acyltransferase. and 
the pcpUde-antibiotic-synthesizing enzymes such a^ 
l^cidin S syndietase and tyrocidine synthetase 
These relationships have been reported (30-38] for 
the other firefly luciferases. 

4.2. Domain structure 

A domain stnicture map for the predicted amino- 
aad sequence of Photuris firefiy luciferase was de- 
veloped by using the ProSite. ProDom. PRINTS. 
BLOCKS, and PepPepSearch programs. Fig. 4 illus- 
trates these results and Tabic 4 defines the sites and 
their presumed functions, if known. 

The T-250TLGYFT-256 sequence (see also Rg. 2) 
is the AMP binding block BL00455B. P.S00339. the 
AA tRNA Ugase 0.2 sequence whose consensus se- 
quence is [GSTALVFHDENQHRKP)-{GSTAj-rLI- 
VMFHDE].R-tLlVMFl-x.[LIVMSTAGHLIVMFYj 
was found as F-43IYIVDRLKSL-440 (correct in 
9/10 positions). The ProSite convention is [ambigui- 
ties where indicated amino acids are acceptable] and 
(amino acids not accepted in ihis position) The 
G-338YGLTRYSAVLITPDTDVRPGSTG-362 se- 
quence is the domain II that is conserved in acyl- 
adcnylaie-synthesizing enzymes f30]. The adenylate 
kinase signature. PS00I13. with consensus sequence 
[LIVMFYWX3)-D-G-{FY|-P-R-x(3).rNQj was tenta- 
uvely found as I-411NKDGWLRSGDI-422 (correct 
in 7/12 positions). The G-415WLRSGD-42I se- 
quence is the domain III that is conserved in acyl- 
adenyiate-synthesizing enzymes (30]. The SKL se- 
quence at the C-terminus is the microbody-directing 
sequence and was detected by the SORT program (it 
is also ProSite PS0O342). 

„ ^AMP-binding domain signature of 

LIVMFyl.x(2)-[STGK2)-G-[STHSTEHSG]-x- 
IPALIVMJ-K (ProSite P00455) W3s found as V- 
194MFSSGTrGVSK-205 and is highly conserved 
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10747 




170 



P3004M P300113 PSOOm PS00043 

BL004UA BLOO^MB 

Fig. 4. Putative functional domains of Phoruris pennsyivanica 
firefly luciferasc. The figure is a composite of the BEAUTY map 
and ProDom, ProSite and Blocio maps of possible functional 
domains. The ProDom sequences are: 3894. luciferin 4.mofKK 
oxygenase (fireny luciferasc): 170. 4-coumaraic CoA ligase- 
3895. luciferin 4.monooxy5enasc: 184. gramicidin S synthetase- 
lucifenn 4-monoo)iygcna«:; 937. 4-coumaratC CoA ligasc* 
10747. 4-coumaraie CoA ligase; 10748. 4-coumanitc CoA ligase' 
10832. ^acylglycerophosphocthanolaminc acyitransferase- 185 
gramicidin S synthetase: and 3891. luciferin 4-monooxygcnase* 
The ProSiifi sequences arc: PS00453. putative AMP-binding do- 
main sequence; PS00II3. adenylate idnase signature (similar^ 
^339. aminoacyl-tRNA synthetase class 11.2 (similar): and 
PS00342, microbodies C-tcrminai targeting sequence. The Blocks 
are: BL00455A .nd BL00453B. the putative AMP-binding do- 
main. 



among the various firefly species — the consensus 
sequence for luciferase is L,mM,.,N S S 
^n6)^(io)*fi6)^i6)hu>rf,4)K((^jj (where the subscript 

Table 4 

ProDoms and ProSites found i n firefly luciferasc 
Element 



•19 

number in parentheses is the number of occurrences 
in the 16 sequences). 

Among the sequence.^ not found (whose possible 
existence was considered on the basis of similarity in 
function) is the consensus pattern of [FYHl-R^X- 
[DE]-X(4, 12)-[RH].X(3)-F-X(3HDEL ProSite 
PS00I79 which is the amino-acid tRNA ligase n 1 
pattern. ProSites that were sought but not found 
mclude: the ATP/GTP-binding site motif A (P-loop) 
PS000I7, the chloramphenicol acetyltransferase ac* 
tive.sitc PSOlOO, the protein kinase signatures 
(PS00107, PS00I08. and PS00I09), the ubiquitin- 
l^^m^^^^ enzyme signatures (PS00536 and 
PS00865), and the acyl-CoA-binding protein signa- 
ture PS00880, Since the P-loop occurs in many pro- 
teins that bind ATP/GTP, it was the subject of a 
search. Protein kinases have binding sites for ATP 
and there is some similarity. The CoA binding do- 
mains were sought because CoA influences the time- 
course of light production by firefly luciferase [39- 
41]. 

TJe domains that can be recognized involve 
ATP(AMP)-binding sites, regions that interact with 
ATP. regions that are involved in reactions leading to 



Type 



ProDom 

3891 

3894 

3895 

3896 

184 

185 

937 

I074H 

10747 

10832 

f*roSite 
PSOOII.'^ 
P$0O339 
PS00342 
PS0O455 
motif t 
molif 2 



Length, 
residue 



Number 
proteins 



% identity 
with Ppe2 



Identified 
function 



Firefly luciferase 
Firefly luciferase 
Firefly luciferase 
Firefly luciferase 
Gramicidin S synihctaic 

Gramicidin S synthetase 
4-coumaratc-CoA ligase 
4-coumarate-CoA ligase 
4-coumarate CoA Ugw 

2-acylglycefophosphoeihanolamine 
acyitransferase 

Adenylate kinase 
AA tRNA ligase II 
Firefly luciferasc 



30 


3 


60 


50 


3 


52 


61 


3 


49 


51 


3 


67 


34 


45 


50 


43 


45 


42 


25 


13 


45 


27 


! 


52 


21 


1 


44 


24 


1 


42 


12 


45 


58 


10 


99 


80 


3 


88 


100 




85 





12 
9 



69 
67 



Unknown 
Unknown 
Unknown 
Unknown 
Unknown 
Unkjiox)c>Ti 
Unknown 
Unknown 
Unknown 
Unknown 



Adenylate kinase signature 
Signature 

C-terminal microbody-directing 
AMP<binding signature 



Whole sequence. *^ *PP«^ ^ insufficient sequence homology over the 
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the formation of adenylate intemiediates. and func- 
tion in peptide synthetases. As expected, there are 
several regions that are found in other firefly iu- 
cirerases. 
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4.3. Evolutionary relationships 



Wood 142] has reviewed the information on chemi- 
cal mechantsm and evolutionary developmen. of the 
beetle luoferases. The level of dissimilarity among 
the beetle luciferases is large; only 27% of thf 
ammo-acd sequence is conserved among the cloned 
bee lc lucferases. Wood concluded that the rate of 
evo^..on of rhe luciferases is high relative to oLr 
enzymes. 

Wood presented a tree diagram showing the rela- 
tionsh.p of the beetle luciferases wi.h the CoA syn- 
heuses [42], The closest enzyme to the lucifels' s 
4-coumarate:CoA hgase which has 17% identity in 
ammo-acd sequences. Wood [42] postulates that the 
Jirefly luciferases may have evolved from the CoA 
synthetases. CoA influences the pattern of light pro- 
ductton by the luciferases. Without CoA. satu aSng 
concentrations of ATP produce a flash of light. CoA 
prevents the subsequent inhibition of light production 
a^d allows a sustained production of light. Wood (40] 
and Ford et al. f4i] found that the -SH group of CoA 

"cLT'^'l ^"'^''y O'her nu. 

cleonde analogs can produce a steady light produc- 

suUstf i° CoA f43] n-ese 

results have been intetpreted as an enhanced turnover 
of the enzyme mediated by conformational changes. 

A -"ay be a vest]- 

giaJ CoA bmding site on the luciferases. The civstal 

ZTlsV"^ '"Chloramphenicol acetyS 
uTSrtl ^ '-^S A resolution by 

[«nt^l^ "^"^ '''"^'"8 site analyzed 

TtO^ ; n r;i^- ^^-'52. K-177. Y-178. 

ir /""'- t P^"'^™ 
occurs .n P. pertnsylvamca and other firefly lu- 

cfcrases (the residue number corresponds to L P. 

^'^ 'he number 

of t mcs that the residue is found in the aligned 

V16: K442. 16/16; Y443. 16/16 Hi^. '27,1: 



4.4. Crystal structure 



The crystal stiotcture of recombinant P. pyralis 
lucjf.^ has been determined at a resolution of 2.0 

Si ?o^^ '""^ '=°'"P^« domains _ the 
N-termmal 80% and the C-terminal 20%. The N 
^nninal domain contains a ^barrel and two /3-sheets 
which are flanked by a-helices. The C-termTna! " 

ZV^TaI'^''. ^"'^P^^'Jlel ^strands and a 
three-stranded mixed ^sheet. with thn^ helice! 
packed against the side. TTiere is a large cleft be^S 

the f«fly luciferases a„^ i„ both parts. Tf« P-.oop is 
^ a loop connecting antiparallel strands 6 and 7 of 

wifir ! t' "^^^ ' '"'S' conformational change 
when substrates are bound and establishment of a 
nonpolar environment would insure high quantum 
yield. Cont. et al. (46) suggest that the C 'e^nS 
region moves to form a cap. The amino-acid reZes 

with CoA are found on both sides of the cleft There 
are two peptide sequences that are labeled with ra- 
d^ttye thiourea dioxide, a lysine-reagen, fl4j. 
Those two peptides are positioned to be ^irt o the 

orr' '"t ^m""' and the 

other m *e N-terminal domain on faces of the cleft 

numbenng) suggested above are predicted to be a 
part of the P pyralis active site containing (P 
pyratts sequence numbers) S-198. K-206 Y-340 E- 

B f;''''rJ,f^'> ^-^21. and 0-422 [4^].^ 

Baldwin [47], i„ a review of the firefly luciferase 
«~cture that reveals a new protein foId'concluS 
that the mystery remains' as to the molecular mech- 
anism of catalysis. The crystals were obtained with- 
out bound substrates or other ligands and the sug- 
gested acuve site is based on those amino-acid 

Snvll?' ^ ^"Perfamily of 

adenylate-fonmng enzymes. 

Elucidation of the amino-acid residues involved in 
cau^yztng d,e firefly luciferase reaaions awaits the 
ad^r"^" ^^-c^-^c^ modification experi- 
2 site-directed mutagenesis ..n.Sies. 

ciystallographic analysis of the 
protein structure with bound reactants. n,e large 
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conformational changes that occur during catalysis 
suggest that the dynamics of these changes are impor- 
tant in understanding the mechanism. Firefly lu- 
ciferase differs from the adenylaie-forming enzymes 
used m the structural comparisons described above 
because luciferase is a moiioyxygenase and there 
must be conserved amino-acid residues for that activ- 
ity. The tools are rapidly becoming availabJe for a 
better understanding of diis bioluminescent process. 
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ABSTRACT An efficient /3-fucosidase was evolved by 
DNA shuffling from the Escherichia coli lacZ /3-galactosidase. 
Seven rounds of DNA shuffling and colony screening on 
chromogenic fucose substrates were performed, using 10,000 
colonies per round. Compared with native /3-galactosidase, 
the evolved enzyme purified from cells from the final round 
showed a 1,000-fold increased substrate specificity for (f- 
nitrophenyl fucopyranoside versus o-nitrophenyl galactopyr- 
anoside and a 300-fold increased substrate specificity for 
/7-nitrophenyl fucopyranoside versus p-nitrophenyl galacto- 
pyranoside. The evolved cell line showed a 66-fold increase in 
p-nitrophenyl fucosidase specific activity. The evolved fuco- 
sidase has a 10- to 20-fold increased /rcm/^m for the fucose 
substrates c<un pared with the native enzyme. The DNA se- 
quence of the evolved fucosidase gene showed 13 base changes, 
resulting in six amino acid changes from the native enzyme. 
This effort shows that the library size that is required to 
obtain significant enhancements in specificity and activity by 
reiterative DNA shuffling and screening, even for an enzyme 
of 109 kDa, is within range of existing high-throughput 
technology. Reiterative generation of libraries and stepwise 
accumulation of improvements based on addition of beneficial 
mutations appears to be a promising alternative to rational 
design. 



Proteins and enzymes with novel functions and properties can 
be obtained either by searching the largely unknown natural 
species or by improving upon currently known natural proteins 
or enzymes. The latter approach may be more suitable for 
creating properties for which natural evolutionary processes 
are unlikely to have been selected. 

One promising strategy to create such novel properties is by 
directed molecular evolution. Starting with known natural 
protein(s), multiple rounds of mutagenesis, functional screen- 
ing, and amplification can be carried out. When the mutation 
rate, library size, and selection pressures are properly bal- 
anced, the desired phenotype of a protein generally increases 
with each round (1-8). The advantage of such a process is that 
it can be used to rapidly evolve any protein, without any 
knowledge (if its structure. 

A number of different mutagenesis strategies exist, such as 
oligonucleotide cassette mutagenesis, point mutagenesis by 
error-prone PCR or the use of mutator strains, as well as DNA 
shuffling (1-5, 8). A theoretical approach to choosing a 
preferred mutagenesis strategy would be to determine the 
target protein's fitness landscape (9), which is a plot of fitness 
(on they axis) versus sequence space (on the.Y axis). However, 
because the sequence space of an average protein of 500 amino 
acids is 20-**"", determination of even a fraction of the fitness 
landscape is a nearly impossible and impractical undertaking. 
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Because there are just a few fundamental ways to search 
sequence space, it may be informative to compare the perfor- 
mance of these methods for specific model systems. 

Natural genes are thought to have evolved by mutation and 
recombination within a population of diverse, but highly 
related, sequences. We suggest that a search algorithm similar 
to that which slowly created the fitness landscape of a natural 
protein in the first place is likely to also be the preferred 
method for further searching this natural sequence landscape 
(5, 10). This approach is supported by our demonstration of the 
advantage of recombining mutations (over introduction of 
point mutations alone) for increasing the activity of a natural 
)3-lactamase protein (2). However, recombination may not 
always be the best search algorithm. For searching the fitness 
landscapes of nonnalural sequences under unusual conditions, 
it is conceivable that a different approach may be more 
optimal. 

We obtain in vitro recombination of infrequent point mu- 
tations by a PCR-based technique called DNA shuffling (1-5). 
A pool of closely related sequences is fragmented randomly, 
and these fragments are reassembled into full-length genes via 
self-priming PCR and extension in a process we call reassembly 
PCR (4). This process yields crossovers between related se- 
quences due to template switching. Shuffling allows rapid 
combination of positive-acting mutations and simultaneously 
flushes out negative-acting mutations from the sequence pool 
(Fig. 1). When coupled with effective selection and applied 
reiteralively, such that the output of one cycle is the input for 
the next cycle, reiterative DNA shuffling has been demon- 
strated to be an efficient process for directed molecular 
evolution (1-3). 

In our previous shuffling studies we used selection and/or 
large libraries (1, 2). Our primary goal in this work was to 
determine whether detection by screening of libraries of 10,000 
clones, a number that is within range of any high throughput 
screening procedure, would be sufficient to obtain significant 
enhancement of a minor activity of j3-galactosidase, a highly 
specific and complex enzyme, and at 109 kDa, one of the 
largest single-chain proteins in Escherichia coli. If screening 
would detect significant improvement, we then would establish 
that improvements are obtainable by evolution with such small 
libraries. 

E. coli )3-galactosidase, encoded by hicZ (11), is widely used, 
and its biological function, catalytic mechanism, and molecular 
structures are well characterized (11-15). It is a tetramer of 
identical subunits of 1,023 amino acids (13, 16, 17). The crystal 
structure of )3-galactosidase is solved and shows that each 
subunit forms five structural domains (14). Each active site 
resides mainly in one subunit, but part of another subunit al.so 
is involved (14). The native enzyme hydrolyzes ^-galactosyl 
linkages, such as the )3(1, 4)-linkage in its natural disaccharide 



Abbreviations: ONPG, c-nitrophenyl ^-i>galactopyranoside; ONPF, 
o-nitrophenyl j3-D-fucopyranoside; PNPG, p-nitrophenyl )3-l>- 
galactopyranoside; PNPF, p-nitrophenyl ^-i>fucopyranoside; X-Fuc, 
5-bromo-4-chloro-3-indolyl /3-n-fucopyranoside. 
*To whom reprint requests should be addressed, e-mail: 
maxygcn@maxygcn.com. 
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Fk;. 1. Schematic illustration of the DNA shulfling process used 
in the present study. 

substrate, lactose. The native j3-galactosidase is known to be 
highly specific for /3-i>galactosyI substrates, A multistep model 
of the reaction was proposed (18, 19) based on kinetic studies 
of the native enzyme for n-nitrophenyl /3-D-galactopyranoside 
(ONPG), /?-nitrophenyl j3-D-gaIactopyranoside (PNPG), and 
other substrates and substrate analogs. The native /3-galacto- 
sidase acts only weakly on /3-D-fucosyl moieties (18-20) and 
does not act on most substrate analogs. 

MATERIALS AND METHODS 

E, coli )3-galactosidase (EC 3.2.1.23) and the galactosyl and 
fucosyl substrates 5-bromo-4-chloro-3-indolyl )3-galactopyr- 
anoside (X-Gal), PNPG, ONPG, 5-bromo-4-chloro-3-indolyl 
j8-i:>-fucopyranoside (X-Fuc), p-nitrophenyl )3-D-fucopyrano- 
side (PNPF), and ^-nitrophenyl j8-i>fucopyranosidc (ONPF) 
were purchased from Sigma. Plasmid pCHUO containing a 
lacZ gene was from Pharmacia. E. coli strain TBI was a gift 
from Charles Roessner of Texas A&M University. 

Construction of Plasmid plSlacZ. A 3.8-kb HindlU and 
BamHl restriction nuclease fragment from pCHllO contain- 
ing a lacZ gene (codon 8 fused to a short N-terminal peptide) 
and the f;pi promoter region (21) was subcloned into the 
Hindlll and BamHl sites of vector plS-sfi-kan-sfi vector, a 
2.3-kb pUC18 derivative in which the ampicillin gene is 
replaced by a kanamycin phosphotransferase gene (2). The 
resulting plasmid, named plSlacZ, was used for DNA shuf- 
fling. DNA fragments of 50-200 bp were used and reassem- 
bled as described previously (1, 2). The PCR primers for 
amplification of the reassembled genes were AGCGC- 
CCAATACGCAAACCGCCTCTCCCCGCGCGTTGGCC 
(forward) and CTATGCGGCATCAGAGCAGATTGTACT- 



GAGAGTGCACCAT (reverse), located on either side of the 
BamHl and HindlU fragment. The reassembled gene was 
digested with restriction enzymes Hindlll and BamHl and 
ligated back into the P18-sfi-kan-sfi vector. The ligation mix- 
ture was electroporated into £. coli TBI competent cells and 
plated out on Luria-Bertani plates (150 mm) with 40 /Ag/ml 
kanamycin and 2 mg/plate of the X-Fuc substrate (22). The 
plates were incubated 12 to 24 hr at 37''C. The resulting 
kanamycin-rcsistant transformants were visually screened for 
the intensity of the blue color. The 20-40 colonics with the 
most intense blue color were picked from about 10,000 trans- 
formants of each round and used lor the next round of DNA 
shuffling. Seven rounds of DNA shuffling and screening were 
carried out. The best clone from the final screening round, 
called evolved |3-fucosidase, was characterized in detail. 

Enzyme Purification. For purification of the native )3-ga- 
lactosidase and the evolved )3-fucosidase, a histidine tag (Hisf,) 
was fused to the N terminus of both enzymes by PCR with two 
primers [5'-(P)CATCACCATCACCACCATATCGTCAC- 
CTGGGACATGT and 5'-(P)GTATTTTTCGCTCATGT- 
GAA] in a standard PCR. The hislidine-tagged native and 
evolved enzymes were purified from overnight TBI cell cul- 
tures harboring the corresponding phismid (23). The crude cell 
extract, in 50 mM phosphate (pH 7.0) with 100 mM NaCl and 
0.2 mM of phenylmethylsulfonyl fluoride protease inhibitor 
was passed through a 20-ml Ni-nitrilotriacelic acid agarose 
(Qiagen) column. The bound protein was stepwise-eluted with 
the same buffer containing 5 mM, 10 mM, 25 mM, and 100 mM 
imidazole. The active fractions from the metal affinity column 
were desalted and loaded on a DEAE column in 20 mM Tris 
(pH 7.5), followed by elution with a 0 to 1 M NaCl gradient. 
The active fractions were concentrated and loaded on a 
Superose 12 gel filtration column in an FPLC protein purifi- 
cation unit (Pharmacia). SDS/PAGE analysis (data not 
shown) showed that the native galactosidase and the evolved 
fucosidase were greater than 90% pure. 

Enzyme Kinetics. j3-Ga!actosidase activity was assayed using 
the synthetic chromogenic substrates ONPG and PNPG. ^-Fu- 
cosidase activity was assayed using chromogenic fucosyl sub- 
strates ONPF and PNPF. Enzyme assays were performed at 
25°C and pH 7.0 in 30 mM N-tris(hydroxymethyl)methylanii- 
noethanesulfonic acid with 1 mM MgCh and 150 mM NaCl. 
The absorbance change at 420 nm was recorded with time, and 
product formation was quantitated using the absorption ex- 
tinction coefficient (2.65 mM"^-cm~' for o-nitrophenol and 
6.7 mM~'-cm"' for /?-nitrophenol). For kinetic parameter 
measurements, the initial velocity Ko (when less than 10% of 
the substrate was converted into product) was determined with 
varied substrate concentrations. The values of KnK,x and 
were calculated using the simple weighting method of Cornish- 
Bowden (24). The Kmj.x values were converted to/Ccat values, the 
turnover number per active site, by normalizing for the enzyme 
concentrations by the molecular mass of the monomer. The 
and k^;:n values of the wild-type j3-galactosidase for ONPF 
could not be determined directly because of the low activity on 
this substrate. The /cait/^m value had to be estimated from the 
enzyme dilution factor required for the native enzyme to 
generate the same amount of o-nitrophenol product from 
ONPG after the same period of time (usually several hours) 
and from the /caii/^m value of the wild-type enzyme on ONPG. 

Sequencing of the Evolved lacZ Gene. The 3.8-kb DNA 
fragment encoding the evolved )3-galactosidase and its flank- 
ing regions was sequenced in both forward and reverse direc- 
tions with 20 primers using an Applied Biosystems 391 DNA 
sequencer. 

RESULTS AND DISCUSSION 

Strategy for Evolving jS-Oalactosidase. The primary goal of 
the experiment was to determine if a substantial enhancement 
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in the specificity and/or activity of a large model enzyme could 
be obtained by reiterative screening of libraries of a size 
(10,000 clones) that is routinely accessible by high throughput 
detection assays. No structural information was used in the 
design of the experiment, but the structure of )3-galactosidase 
is useful for interpretation of results. 

Screening for Improved Fucosidase Activity. The 3.8-kb 
DNA fragment of plSLacZ containing the lacZ gene was 
shuffled as described previously (1-3), and the reassembled 
genes were digested with restriction enzymes {HindlW and 
BamW) and I i gated back into the vector pl8-sfi-kan-sfi. The 
initial diversity was introduced into the native lacZ gene by 
random point mutagenesis, which occurs by shuffling of small 
fragments (1, 2). We previously showed that shuffling with 10- 
io 50-bp fragments resulted in a 0.7% rate of point mutation. 
Here we used fragments of 50 to 200 bp, which results in a 
much lower rate of point mutation, resulting in inactivation of 
approximately 20% of the clones. X-Fuc was chosen as the 
indicator substrate for the plate assay because of the nondif- 
fusable nature of the colored product and the high sensitivity 
(22). After each round of DNA shuffling, 10,000 kanamycin- 
resistant transformants, growing on plates supplemented with 
X-Fuc, were visually screened for enhanced blue color forma- 
tion. About 2-5% of the transformant colonies in each round 
showed colonies that were more highly blue-colored than the 
bulk of the population. The 20-40 bluest colonics (0.2-0.4%) 
were picked at each round, individually verified to be more 
active than the pool from the previous round by plate assays, 
and then used as a pool for the source of DNA to initiate the 
next round of DNA shuffling. This number of colonies was 
chosen as a compromise between obtaining too little diversity 
(<10 colonies) and obtaining suboptlmal selection pressure 
(:»100 colonies), which could limit the rale of improvement. 
In the seventh round of DNA shuffling some colonies devel- 
oped a deep blue color after overnight growth (Fig. 2). One 
mutant from this seventh and final round of shuffling showed 
a 66-fold increase in fucosidase activity on 1 mM PNPF (Fig. 
3). 




1 



Fig. 2. E. coli TBI cells expressing the native /3-gaIactosidase 
(white colonies, Upper Left) and the evolved fucosidase of the seventh 
round (blue colonies, Upper Right) after overnight growth on an 
Luria-Bertani plus kanamycin plate supplemented with 0.1 mM 
X-Fuc. (Lower) The results of plating a deliberate mixture of the two 
types of colonies. 
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Fig. 3. Whole cell fucosidase activity on PNPF of the pool of 
colonies selected after each round of DNA shuffling. Rounds 1-7 are 
pools of colonies. Also shown are the activity of cells expressing the 
native /3-galactosidasc and cells expressing the evolved /3-fucosidasc, 
both measured as whole cell activity of single clones. The evolved 
fucosidase is the single-best colony selected after quantitative com- 
parison of the 24 best colonies from the pool of colonies obtained after 
shuffling round 7. For assay conditions, see Materials ami Metliocis. 

Kinetics. After the final round of selection, (His)^, tags 
were added to the foreign N terminus of the native /3-ga- 
lactosidase and the evolved j3-fucosidase enzymes. Both 
enzymes were purified, and the kinetic constants of each 
enzyme on the synthetic chromogenic substrates ONPG, 
ONPF, PNPG, and PNPF were determined (Table 1). For 
PNPF, the Km value of the evolved fucosidase is decreased 
by 20-fold from the Km of wild-type |3-galactosidase on the 
same substrate. The kcm value is decreased about 2-fold. The 
^cat/^m values thus are increased about 10-fold in the evolved 
j3-fucosidase. The activity of the wild-type enzyme on ONPF 
was very low and accurate Km and kaa values could not be 
obtained. By comparing the relative reaction rates of the 
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Kinetic constants for the native and evolved enzymes 






Native 


Evolved 


Substrate 


Kinetic constant 


galactosidase 


fucosidase 


PNPG 




268 


3U.y 




mM 


0,04 


0.18 




kcjKm, mM-'-s-' 


6,700 


172 


PNPF 


kc.u 


209 


96.6 




Km, mM 


31 


1.5 




kciJKm, mM"''S"' 


6.7 


64.4 


Specificity 


(^ca(/^iii)l'Nl'G 


1,000 


2.7 




(^i:at/^m)l'Nrr 






ONPG 


kcMj s 


765 


14.5 




Km, mM 


0.11 


0.11 




W^m, mM-'-s-' 


6,950 


132 


ONPF 






24.1 




Km, mM 




0.55 




k,-M/Km, mM-'-s-> 


(2)* 


43.9 


Specificity 


(/Ctai/^m)oNFG 


3,200 


3.0 




(^tai/^m)oNri- 







The native galactosidase and the evolved fucosidase were purified, 
and the enzymes were assayed on four different substrates. 
*The k^-ai/Km value for the native galactosidase on ONPF was esti- 
mated to be about 2 mM"'-s-i by measuring the hydrolysis rate 
relative to that of ONPG. 
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Fig. 4. Nucleotide substitutions in the evolved fucosidase gene. The predicted amino acid changes are shown above the gene by the single-letter 
denotation, numbered according to the wild-type )3-galactosidase .sequence (17). Amino acid changes in the N-tcrmina!ly fused peptide region 
(hatched area) are indicated by small vertical arrows. Mutations that do not result in amino acid changes are shown below the gene, numbered 
starling at the //mdlll site, as in the parental plasmid pCHllO. The gpt promoter is indicated by a thick arrow. The positions of the known active 
site residues of the wild-type )3-galactosidase arc indicated by black bars. 



wild-type enzyme on ONPF and ONPG (at the same enzyme 
and substrate concentrations), the /ccai/^m for ONPF was 
estimated, assuming that the k^m/K^ value is a second order 
rate constant. The kc»t/f<m values on ONPF were increased 
at least 20-fold in the evolved )3-fucosidase. These increases 
in fucosidase activity were accompanied by decreases in 
galactosidase activity. For the substrates PNPG and ONPG, 
the /ccat/^m is decreased 40-fold and 50-fold, respectively. 
These kinetic parameter changes suggest that the substrate 
binding pocket in the evolved /3-fucosidase is different from 
that of the wild-type )3-galactosidase. 

The native enzyme is highly specific for hydrolyzing galac- 
tosyl rather than fucosyl substrates. The kcm/f^m vahies we 
determined for PNPG and PNPF differ by about 1,000-fold, 
and for ONPG and ONPF the values differ by more than 
3,U00-fold (Table 1). The values we determined for the native 
j3-galactosidasc on ONPG, PNPG, and PNPF arc in between 
the values reported previously (18, 20), The substrate speci- 
ficity changed dramatically from the native )3-galactosidasc to 
the evolved )3-fucosidasc. For the evolved /3-fucosidase the 
k,,jKrn values for substrates PNPG and PNPF differ 2.7-fold 
and for substrates OSfPG and ONPF the k^JK^, values differ 



3-fold. Therefore, the relative substrate specificity for fucosyl 
substrates, from the native to the evolved enzyme, is increased 
1,000-fold for the o-nitrophenyl substrates and 300-fold for the 
/?-nitrophenyl substrates. The substrate specificity change was 
further supported by inhibition of the enzymatic activity by 
isopropyl )3-i>thiogalactopyranoside, a /3-galactosidase sub- 
strate analog and a competitive inhibitor of galactosyl sub- 
strates. The K\ values increased by one order of magnitude 
from the wild-type enzyme to the evolved |3-fucosidase, from 
0,1 mM to 0.9 mM. The changes in /C,„ values for the galactosyl 
substrates showed the same trend, because they either in- 
creased severalfold or stayed the same. These results imply that 
the substrate binding affinity of the evolved )3-fucosidase is 
substantially increased for fucosyl substrates and decreased for 
galactosyl substrates, and hence the substrate binding pocket 
is likely to be significantly modified. 

DNA Sequence. The DNA sequence of the evolved fucosi- 
dase gene showed 13 nucleotide subslitutions of which 11 were 
in the coding region. Six of the mutations are predicted to 
cause amino acid changes in the translated j3-gaiactosidasc 
sequence. Two additional mutations are predicted to cause 
amino acid changes in the N-tcrminal fusion peptide (Fig. 4). 




Fig. 5. Ribbon representation of the E. coli ^-galactosidase subunit structure (14). The CNO atoms of the six amino acid mutations that 
conferred the fucosidase activity are shown with stick representation. Two mutations in the active site (Asp604 and Gln572) are shown in red. Two 
mutations in close proximity of the active site (Pro511 and Asp908) are shown in magenta. Two mutations far away from the active site and on 
the protein surface (Val9 and Glnl35) arc shown in green. The rest of the substrate binding and active site residues are shown in yellow. 
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All 13 nucleotide changes were base transitions between 
purines and/or between pyrimidines, which usually are more 
frequent than transversions. 

One major advantage of m vitro evolution of enzymes over 
the structural modeling approach is that only minimal in- 
formation is required for improving the desired phcnotypc. 
At each round of our experiment, only colonies with in- 
creased fucosidase activity were pooled and used for the next 
round of DNA shuffling and screening. Although both 
positive-acting mutations and neutral mutations may accu- 
mulate in the evolved fucosidase lacZ gene in each round, wc 
expect that neutral mutations generally do not survive 
multiple rounds of shuffling and screening due to a back- 
crossing effect exerted by the consensus sequence (1, 2), 
combined with the lack of a selective advantage of the 
neutral mutations. Therefore, only mutations that somewhat 
contribute to the improved fucosidase activity are likely to 
accumulate in the evolved fucosidase. While we have not 
determined the effect of the separate mutations by site- 
specific mutational studies, wc can predict what roles some 
of the mutations may play based on the three-dimensional 
structure of the parental |3-galactosidase (ref. 14; Fig. 5) and 
the sophisticated kinetic models based on previous muta- 
tions and kinetic analysis of purified proteins (18, 20, 25, 26). 
Among the six amino acid changes in the |3-galactosidase 
sequence, none appear directly involved in the inter-subunit 
contact. Three mutations (Pro511Ser, Gln573- 
Arg, and Asn6()4Ser) are located in domain 3 (residues 
334-627) of the wild-type E. coli jS-galactosidase (14). 
Domain 3 in the native protein contains most of the amino 
acids that form the substrate binding pocket (ref. 14; Fig. 5). 
Asn604 is one of the amino acids forming this substrate 
binding pocket in the protein (14), and this residue is 
conserved in several other known )3-galactosidase sequences 
(25, 27-29), except the evolved galactosidase gene (ebgA) of 
E. coli (30). In our evolved fucosidase, Asn604 is replaced by 
Ser. This mutation presumably affects the enzyme's sub- 
strate specificity. All the other mutations found in the 
evolved /3-fucosidase enzyme do not directly affect the active 
site and substrate binding pocket residues, and therefore 
they may have no effect or may only subtly change the 
conformation of the active site and substrate specificity. 
Gln573, substituted by Arg in the evolved fucosidase, is in 
close proximity to the substrate binding pocket (Fig. 5). The 
mutation ProSllSer is also close to the active site and 
substrate binding pocket (Fig. 5). These two mutations are 
likely to affect the enzyme's active site, Asp908Asn is also 
close to the active site and may also affect the activity. 
Additional important catalytic residues of the active site, 
such as Glu461, Met502, Tyr503, and Giu537 (23, 26, 31), 
however, are unchanged in the evolved /3-fucosidase, imply- 
ing that the catalytic mechanism of the evolved enzyme 
remained the same. Therefore the evolved /3-fucosiciase 
seems to have only adjusted to fit the fucosyl substrate or its 
transition state better than the wild-type j3-galactosidase 
does. In addition, one of the nucleotide mutations outside 
the structural gene (c23t) is very close to the gpt promoter 
region and could affect transcription (Fig. 4). This mutation, 
along with the two amino acid mutations in the N-terminal 
fusion peptide (Fig. 4), may influence the expression level of 
the protein. Indeed, we found that the evolved )3-fucosidase 
enzyme was expressed at least 2- to 3-fold higher than the 
wild-type enzyme (data not shown). The mutations Val9IIe 
and Glnl35Arg are far away from the active site and near the 
surface of the protein (Fig. 5), and may not have any 
significant effect on the enzymatic activity. The analysis of 
mutations obtained by molecular evolution of proteins pro- 
vides a new tool for studying structure-function relation- 
ships. However, the real utility of DNA shuffling is the 



ability to rapidly improve enzyme functions without the need 
to delineate the myriads of complex molecular mechanisms. 

There are several possible applications for the evolved 
j3-fucosidase. One is as a novel reporter for )3-i>fucosyl 
substrates, in addition to the widely used lacZ gene reporter. 
The advantage of using the novel enzyme is the low endoge- 
nous background of /3-fucosidase activity because, unlike 
a-fucosidases, /3-fucosidases are uncommon in nature. This 
well expressed fucosidase also could be used for the production 
of fucosyl adducts or for disaccharide synthesis by transglyco- 
sylation or reversal of the hydrolysis reactions, because anal- 
ogous applications already have been demonstrated for the 
wild-type j3-galactosidase (32, 33). Some of these applications 
may require further evolution of I he fucosidase for the specific 
reaction. The present data suggests that it is reasonable to 
attempt to obtain such improvements by DNA shuffling and 
screening of libraries of modest size. 

We thank A. Crameri for technical assistance and Drs. F. H. Arnold, 
R. E. Huber, and P. Schat/ for useful discussions. 
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Claims 2, 7, 8, 10, 13, 14, 16, 17, 19, 22, 23, 40, 46, 48 

59, 61-63, 65, 66, 68, 72, 73, 75, 79, and 89 have been 
canceled. Claims 1, 3-6, 9, 11, 12, 15, 18, 20, 21, 24-39, 41- 
45, 47, 60, 64, 67, 69-71, 74, 76-78, 80-88, 90-94 and newly 
presented claims 95-96 are still at issue and are present for 
examination. 

Claim 64 remains withdrawn from further consideration 
pursuant to 37 CFR 1.142(b), as being drawn to a nonelected 
species, there being no allowable generic or linking claim. 
Applicant timely traversed the restriction (election) 
requirement in the response filed 11/18/02. 

Applicants' arguments filed on 6/22/06, have been fully 
considered. Rejections and/or objections not reiterated from 
previous office actions are hereby withdrawn. 

Claims 90, 95, and 96 are objected to because of the 
following informalities: the word "to" needs to be inserted 
following "selected" in the phrase "codons are selected reduce 
the number of identified sequences or sites". Appropriate 
correction is required. 

Claims 1, 3-6, 9, 11, 12, 15, 18, 20, 21, 24-39, 41-45, 47 

60, 67, 69-71, 74, 76-78, 80-83, 85-88, and 90-96 are rejected 
under 35 U.S.C, 112, second paragraph, as being indefinite for 
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failing to particularly point out and distinctly claim the 
subject matter which applicant regards as the invention. 

Claims 1 (from which claims 3-6, 9, 11, 12, 15, 20, 21, 24- 
39, 41-45, 60, 69, 70 81, 86, and 90 depend), 47 (from which 
claims 71, 82, and 87 depend), 67 (from which claims 69, 70, 81, 
88 and 95 depend), 74 (from which claims 76, 77, 81, 88 and 96 
depend), and 78 (from which claims 80, 82, and 87 depend) are 
vague and indefinite in the recitation of "a reduced number of a 
combination of mammalian transcription factor binding sequences, 
intron splice sites, poly (A) addition sites and/or prokaryotic 
5' noncoding regulatory sequences", ''wherein the mammalian 
transcription factor binding sequences are present in a database 
of transcription factor binding sequences" and "known mammalian 
transcription factor binding sequences". The rejection was 
described in the previous Office Action, 

Applicants argue that the terms ''transcription factor 
binding sequences", "intron splice sites", "poly (A) addition 
sites" and "prokaryotic 5' noncoding regulatory sequences" are 
conventional in the art and argue that these terms are in fact 
used in the reference cited by the examiner in the 103 
rejection. This is acknowledged. However, in the art these 
terms define a group of sequences related by function. The art 
does not define clearly what sequences are included in the 
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group. Since applicants invention requires a skilled artisan to 
quantify the number of such sequences it is imperative that the 
artisan know explicitly what sequences are to be included and 
what sequences are not so one can in fact count them. While the 
art clearly defines some specific sequences which fall into each 
group (for example AAUT^ as a polyadenylation sequence) many 
other sequences may have the same function and not all such 
sequences are known and taught by the art. 

With regard to calculating the number of mammalian 
transcription factor binding sequences, intron splice sites, 
poly (A) addition sites and prokaryotic 5' noncoding regulatory 
sequences, point to Example 1 of the specification and the 
declaration of Dr. Wood submitted with the instant response and 
argue that both evidence that contrary to the Examiner's 
assertion, the calculation of the number of transcription factor 
binding sequences, intron splice sites, poly (A) addition sites 
and prokaryotic 5' noncoding regulatory sequences in a 
particular sequence is possible. However, it has never been the 
examiner's contention that given a clear set of sequences to be 
searched that calculation of the number was not possible, but 
that the claims are indefinite absent a clear definition of what 
sequences are encompassed by these terms. In Example 1 of the 
specification it is clearly set forth that transcription factor 
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binding sequences were mammalian sequences identified in the 
SITE table of TRANSFAC database version 3.2 having a minimum 
length of 5 nucleotides and a minimum log- likelihood (as defined 
in the spec.) of 10, intron splice sites were AGGTRAGT, AGGTRAG 
or YNCAGG, poly (A) addition sites were AATAAA, and prokaryotic 
5' noncoding regulatory sequences were TATAAT or either of AGGA 
or GGAG paired with an ATG codon within 12 or fewer bases 3' to 
said sequence. Each of these is a clearly identifiable and 
defined set of sequences, presuming that the SITE table of the 
TRANSFAC database version 3.2 is obtainable. Similarly in the 
search discussed in the declaration of Dr, Wood, transcription 
factor binding sites are defined as mammalian sequences 
identified in the SITE table of TRANSFAC database version 4.0 
having a minimum length of 5 nucleotides and a minimum log- 
likelihood (as defined in the TESS software literature attached) 
of 10. This is a clearly identifiable and defined set of 
sequences, presuming that the SITE table of TRANSFAC database . 
version 4.0 is obtainable. However, none of applicants claims 
is limited in a similar fashion. It is suggested that 
applicants limit the claims to the sites used in Example 1 of 
the specification and submit a copy of the SITE table of version 
3.2 of the TRANSFAC database. 
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Claim 18 as amended is confusing in the recitation "or the 
complement thereof which encodes a luciferase" as the complement 
does not encode a lucif erase. 

Claims 47 and 83 as amended are indefinite in the 
recitation of "corresponding wild type nucleic acid sequence" as 
it is unclear to what sequences this must refer. Is this 
limited to the wild type sequences from which SEQ ID NOS : 9, 18, 
297, and 301 were derived (i.e., LucPpIYG, SEQ ID N0:1) or to 
any wild type beetle luciferase gene? 

Claim 83 is confusing in the recitation of ''hybridizes 
under medium stringency hybridization conditions to SEQ ID NO: 22 
(Rluc-final) ... and comprises an open reading frame encoding a 
luciferase with 90% amino acid sequence identity to a beetle 
luciferase" as SEQ ID NO: 22 (Rluc-final) is a variant of Renilla 
luciferase which is not a beetle luciferase and in view of the 
lack of similarity of Renilla luciferase with beetle 
lucif erases, a polynucleotide which hybridizes to SEQ ID NO: 22 
could not encode a luciferase with 90% amino acid sequence 
identity to a beetle luciferase. While applicants amended claim 
47 to address this problem they did not amend claim 83, As such 
this rejection is maintained for claim 83. 

Claims 1, 3-6, 9, 11, 12, 15, 20-21, 24-33, 35-39, 41-45, 
47, 60, 67, 69, 70, 81-82, 86-88, and 90-95 are rejected under 



Application/Control Number: 09/645,706 Page 7 

Art Unit: 1652 

35 U.S.C. 112, first paragraph, because the specification, while 
being enabling for (1) a variant of a parent DNA molecule 
encoding a reporter polypeptide identical to a reporter 
polypeptide encoded by said parent DNA, having more than 2 5% of 
the codons altered and having a reduced number of transcription 
factor binding sequences, intron splice sites, poly (A) addition 
sites and promoter sequences than a mammalian codon optimized 
variant of the parent nucleic acid, (2) a variant of a parent 
DNA molecule encoding a luciferase having 90% identity to the 
polypeptide encoded by SEQ ID NO: 2 and having more than 25% of 
the codons altered and having a reduced number of transcription 
factor binding sequences, intron splice sites, poly (A) addition 
sites and 5' noncoding regulatory sequences than a mammalian 
codon optimized variant of SEQ ID NO: 2 or (3) to any nucleic 
acid which will hybridize to SEQ ID NO: 9 under high stringency 
conditions and encode a polypeptide having luciferase activity, 
does not reasonably provide enablement for any variant DNA 
molecules encoding any reporter polypeptide having at least 90% 
identity to a wild type reporter polypeptide, having more than 
2 5% of the codons altered and having a reduced number of 
transcription factor binding sequences, intron splice sites, 
poly (A) addition sites and 5* noncoding regulatory sequences 
than a mammalian codon optimized version of the parent nucleic 
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acid or to any nucleic acid which will hybridize to SEQ ID NO: 9 
under medium stringency conditions. The specification does not 
enable any person skilled in the art to which it pertains, or 
with which it is most nearly connected, to use the invention 
commensurate in scope with these claims. The rejection is 
explained in the previous Office Action. 

Applicants first state that it is unclear how Applicant's 
specification teaches one of skill in the art how to make and 
use a variant of a parent DNA molecule encoding a reporter 
polypeptide identical to a reporter polypeptide encoded by said 
parent DNA, having more than 25% of the codons altered and 
having a reduced number of transcription factor binding 
sequences, intron splice sites, poly {A) addition sites and 
promoter sequences if the art worker would not recognize or 
understand sequences that are mammalian transcription factor 
binding sequences, intron splice sites, poly (A) addition sites 
and prokaryotic 51 noncoding regulatory sequences. Applicants 
are in fact correct that, if a skilled artisan cannot clearly 
identify these sites, he can not practice the invention as 
taught. However, for the instant rejection the claims were 
examined as best possible ignoring this problem (as it is 
clearly addressed by the rejection above) in the interest of 
compact prosecution, such that all possible problems could be 
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identified concurrently. The instant rejection would be 
maintained even if the claims clearly identified all such sites 
for the reasons presented. 

Next applicants argue that with respect to reporter 
polypeptides, such as GFP, beetle luciferase, GUS, CAT, and 
beta-lactamase, applicant has provided evidence that it is well 
within the skill of the art to introduce substitutions into 
various reporter proteins and yield a variant protein with the 
activity of the corresponding wild-type reporter protein. 
However, it is noted that the evidence applicants refer to is 
available for specific GFPs, beetle luciferases, GUS or CAT 
enzymes, and beta-lactamases but each of these groups of 
reporter polypeptides includes vast numbers of proteins which 
are not well characterized and often substantially different 
from those taught in the art . For example there are many 
different luminescent beetle species but only a few firefly and 
click beetle luciferases are well characterized in the art and 
even these enzymes differ from each other enormously. The 
rejected claims are not limited the nucleic acids encoding 
reporters exhibiting high similarity to only those reporters 
which are well characterized {Note claims that are so limited 
such as claims 18, 71, 74, 76-78, 80, 83-85, and 96 are not 
rejected) . 
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Finally applicants argue that one of skill in the art in 
possession of applicant's specification is readily able to 
determine whether a variant nucleic acid molecule hybridizes 
under medium stringency conditions to Applicant's synthetic 
polynucleotides and has an open reading frame encoding a beetle 
luciferase polypeptide which has at least 90% amino acid 
sequence identity to a luciferase encoded by a corresponding 
wild type nucleic acid sequence, and having a reduced number of 
transcription factor binding sequences, intron splice sites, 
poly (A) addition sites and 5' noncoding regulatory sequences 
than a mammalian codon optimized version of the parent nucleic 
acid. However, while methods of determining if any individual 
sequence would have the properties described are well known in 
the art, methods of determining which sequences from the 
virtually infinite genus of sequences capable of hybridizing to 
under medium stringency conditions to the recited nucleic acids 
and encoding a protein having 90% identity to any beetle 
luciferase actually are within the scope of the instant claims 
(i.e., encode a luciferase protein) beyond just making and 
testing all possibilities are not provided. While enablement is 
not precluded by the necessity for routine screening, if a large 
amount of screening is required, the specification must provide 
a reasonable amount of guidance with respect to the direction in 
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which the experimentation should proceed. Such guidance has not 
been provided in the instant specification. 

The following is a quotation of 35 U.S.C. 103(a) which 
forms the basis for all obviousness rejections set forth in this 
office action: 

(a) A patent may not be obtained though the invention is not identically 
disclosed or described as set forth in section 102 of this title, if the 
differences between the subject matter sought to be patented and the prior 
art are such that the subject matter as a whole would have been obvious at 
the time the invention was made to a person having ordinary skill in the 
art to which said subject matter pertains. Patentability shall not be 
negatived by the manner in which the invention was made. 

This application currently names joint inventors. In 

considering patentability of the claims under 3 5 

U.S.C. 103(a), the examiner presumes that the subject matter 

of the various claims was commonly owned at the time any 

inventions covered therein were made absent any evidence to 

the contrary. Applicant is advised of the obligation under 3 7 

CFR 1.56 to point out the inventor and invention dates of each 

claim that was not commonly owned at the time a later 

invention was made in order for the examiner to consider the 

applicability of 35 U.S.C. 103(c) and potential 35 

U.S.C. 102(e), (f) or (g) prior art under 35 U.S.C. 103(a). 

Claims 1, 3-6, 9, 11, 12, 15, 20, 21, 24-39, 41-45, 60, 67, 

69, 70, 81, 86 and 90-95 are rejected under 35 U.S.C. 103(a) as 

being unpatentable over Sherf et al . (US Patent 5,670,356) in 

view of Zolotukhin et al . (US Patent 5,874,304), Donnelly et al . 

(WO 97/47358), Pan et al . , Cornelissen et al. (US Patent 

5,952,547), and Hey et al . (US Patent 6,169,232). The rejection 

is explained in the previous Office Action. 

Applicants argue that the combination of references does 

not disclose or suggest Applicant's invention as each reference 

discloses a different wav to modify the coding sequence of a 
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different gene to increase expression. This is not persuasive 
because each of these references is drawn to methods of 
increasing the expression of a gene in a desired host by 
altering the sequence of the nucleic acid but not the encoded 
protein in a variety of ways which will lead to increases in the 
production of desired protein. The cited references show that 
the art was clearly aware that a combination of changes in codon 
preference and removal of sequences detrimental to transcription 
and/or translation in either the wild type gene or the codon 
optimized version can be used to accomplish this goal. While 
each of the cited references used a different combination of 
types of modifications, the art clearly teaches all of the 
claimed modifications encompassed in applicants claims (i.e., 
mammalian codon optimization, removal of transcription factor 
binding sequences, removal of splice sites, removal of potential 
promoters, and removal of polyadenylation sites) and clearly 
teaches combinations of them with one or more of the others. 

Applicants argue that while there is a general teaching in 
the combination of cited documents to alter codons and/or remove 
certain undesired sequences in a selected sequence, none of the 
cited documents teaches or suggests that codon alterations, to 
prepare a sequence with codons employed more frequently in an 
evolutionarily divergent organism optionally in conjunction with 
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removal of restriction enzyme sites, ATTTA sequences, splice 
sites, polyA sites, A or T strings, CG dinucleotides in adjacent 
codons, prokaryotic promoters, inverted repeats and prokaryotic 
factor- independent RNA polymerase terminators, may create 
transcription factor binding sites and none of the cited 
documents discloses or suggests removal of transcription factor 
binding sites from a codon optimized gene. While it is true 
that none of the cited documents explicitly teach that codon 
replacements may create unwanted transcription factor binding 
sequences not present in the wild type sequence. Hey et al . , 
Donnelly et al . and Pan et al. all show that the art recognized 
that codon modifications can introduce sequences which are 
unwanted within the synthetic gene, that additional codon 
modifications can decrease the introduction of those sequences 
and Sherf et al . clearly teach that the presence of 
transcription factor binding sequences within a reporter gene is 
an unwanted feature as it may interfere with the desired genetic 
neutrality of the reporter gene (see column 8) . Furthermore, it 
is obvious on its face that anytime a gene sequence is altered 
that one necessarily creates new sequences which were not 
previously present and that merely by random chance some of 
these newly created sequences may be detrimental. It is even 
further obvious on its face that the more changes one makes, the 
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higher the chances that such a detrimental sequence will be 
introduced, Sherf et al . made only limited changes to codon 
selection and thus at least in his explicit teachings focused on 
the elimination of detrimental sequences present in the wild 
type sequence. However, the remaining art clearly would have 
motivated one of skill in the art to make more substantial 
changes in codon preference within the luciferase of Sherf et 
al. Furthermore the disclosures of Hey et al . , Donnelly et al, 
and Pan et al. would have clearly led a skilled artisan to scan 
not only the wild type sequence for the unwanted transcription 
factor binding sites but also the codon optimized version 
thereof . 

Applicants argue that to arrive at applicant's invention, 
one of skill in the art in possession of the cited documents 
would need to choose to identity specifically transcription 
factor binding sites, promoter sequences, splice sites, and 
polyA sites, as sequences to be removed by codon replacement 
although the references also teach removal of internal 
palindromic sequences, restriction endonuclease sites, 
glycosylation sites, ATTTA sequences, RNA polymerase termination 
signals, TA and CG doublets, blocks of G or C residues, inverted 
repeats, and long runs of purines. However, this is not 
persuasive as applicants claims are not drawn to any combination 
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in particular and do not exclude removal of other detrimental 
sequences in addition to those specifically recited in the 
claims and the art clearly teaches several combinations of 
these . 

Applicants argue that none of the cited documents discloses 
or suggests the use of software to identify particular 
regulatory sites, such as mammalian transcription factor binding 
sequences, in a database of transcription factor binding 
sequences. However, this is not persuasive as most of 
applicants claims do not even mention the use of software to 
identify sites to be removed. Furthermore, even for those 
claims that do mention this, it is noted that the claims recite 
products not processes. Patentability of a product recited in 
product -by-process format is determined by the characteristics 
of the product itself not by the recited method. A nucleic acid 
in which the sites to be removed were identified by an undefined 
computer program would not differ in any respect from a nucleic 
acid in which the sites to be removed were identified by any 
other method. 

Applicants finally argue that one of ordinary skill in the 
art in possession of the cited art would have no reasonable 
expectation that any particular set of changes would improve 
activity in a gene that is to be expressed in a highly 
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evolutionarily distinct cell. This is not persuasive because 
the art clearly provide an expectation, that codon optimization 
and the elimination of a variety of types of sequences which are 
detrimental to transcription and/or translation will improve the 
expression of a gene in a heterologous host. While it is 
acknowledged that one cannot be certain that the modifications 
will not have unexpected consequences, applicants are reminded 
that obviousness does not require an absolute certainty of 
success but only a reasonable expectation thereof. 

Claims 18, 47, 71, 74, 76-78, 80, 82-85, 87, 88 and 96 are 
rejected under 35 U.S.C, 103(a) as being unpatentable over Sherf 
et al. {US Patent 5,670,355) in view of Zolotukhin et al . (US 
Patent 5,874,304), Donnelly et al . (WO 97/47358), Pan et al , , 
Cornelissen et al . (US Patent 5,952,547), and Hey et al. (US 
patent 6,169,232) as applied to claims 1, 3-6, 9, 11, 12, 15, 
20, 21, 24-39, 41-45, 60, 67, 69, 70, 81, 86 and 90-95 above, 
and further in of Wood et al . (WO 99/14336). The rejection is 
explained in the previous Office Action. 

Applicant has not presented any arguments specifically 
traversing this rejection but instead relies upon the traversal 
discussed above. Therefore, this rejection is maintained for 
the reasons presented above. 
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The nonstatutory double patenting rejection is based on a 
judicially created doctrine grounded in public policy (a policy 
reflected in the statute) so as to prevent the unjustified or 
improper timewise extension of the "right to exclude" granted by 
a patent and to prevent possible harassment by multiple 
assignees. A nonstatutory obviousness- type double patenting 
rejection is appropriate where the conflicting claims are not 
identical, but at least one examined application claim is not 
patentably distinct from the reference claim(s) because the 
examined application claim is either anticipated by, or would 
have been obvious over, the reference claim(s) . See, e.g., In 
re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re 
Goodman; 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re 
Longi, 759 F.2d 887, 225 USPg 645 (Fed. Cir, 1985); In re Van 
Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982)/ In re Vogel, 422 
F,2d 438, 164 USPQ 619 (CCPA 1970); and In re Thorington^ 418 
F.2d 528, 163 USPQ 644 {CCPA 1969) . 

A timely filed terminal disclaimer in compliance with, 37 
CFR 1.321(c) or 1.321(d) may be used to overcome an actual or 
provisional rejection based on a nonstatutory double patenting 
ground provided the conflicting application or patent either is 
shown to be commonly owned with this application, or claims an 
invention made as a result of activities undertaken within the 
scope of a joint research agreement. 

Effective January 1, 1994, a registered attorney or agent 
of record may sign a terminal disclaimer. A terminal disclaimer 
signed by the assignee must fully comply with 37 CFR 3.73(b) . 

Claims 91, 93 and 94 are provisionally rejected on the 
ground of nonstatutory obviousness -type double patenting as 
being unpatentable over claims 1-50 and 58-60 of copending 
Application No. 10/314,827. Although the conflicting claims are 
not identical, they are not patentably distinct from each other. 
The rejection is explained in the previous Office Action. 

Applicants argue that the claims in the present application 



are directed to synthetic nucleic acid molecules for 
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chloramphenicol acetyltranaf erase, Renilla luciferase, beetle 
luciferase, beta-lactamase, beta-glucuronidase or beta- 
galactosidase while the claims of 10/314,827 are directed to 
10/314,827) are directed to synthetic nucleic acid molecules for 
a fluorescent polypeptide. However, it is noted that applicants 
have not amended claims 91, 93 and 94 of the instant application 
to synthetic nucleic acid molecules for chloramphenicol 
acetyltransferase, Renilla luciferase, beetle luciferase, beta- 
lactamase. beta-glucuronidase or beta-galactosidase . These 
claims recite synthetic nucleic acid molecules encoding any 
reporter polypeptide which clearly includes the fluorescent 
polypeptides of the copending application. 

Applicant's amendment necessitated the new ground (s) of 
rejection presented in this Office action. Accordingly, THIS 
ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is 
reminded of the extension of time policy as set forth in 37 
CFR 1 .136 (a) . 

A shortened statutory period for reply to this final action 
is set to expire THREE MONTHS from the mailing date of this 
action. In the event a first reply is filed within TWO MONTHS 
of the mailing date of this final action and the advisory action 
is not mailed until after the end of the THREE-MONTH shortened 
statutory period, then the shortened statutory period will 
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expire on the date the advisory action is mailed, and any 

extension fee pursuant to 37 CFR 1.136(a) will be calculated 

from the mailing date of the advisory action. In no event, 

however, will the statutory period for reply expire later than 

SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier 
communications from the examiner should be directed to Rebecca 
E. Prouty whose telephone number is 571-272-0937, The examiner 
can normally be reached on Tuesday- Friday from 8 AM to 5 PM. 
The examiner can also be reached on alternate Mondays 

If attempts to reach the examiner by telephone are 
unsuccessful, the examiner's supervisor, Ponnathapura 
Achutamurthy, can be reached at (571) 272-0928. The fax phone 
number for this Group is 571-273-8300. 

Information regarding the status of an application may be 
obtained from the Patent Application Information Retrieval 
(PAIR) system. Status information for published applications 
may be obtained from either Private PAIR or Public PAIR. Status 
information for unpublished applications is available through 
Private PAIR only. For more information about the PAIR system, 
see http://pair-direct-uspto.gov. Should you have questions on 
access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free) . 




Rebecca Prouty 
Primary Examiner 
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Codon usage tabulated from the GenBank GeneCk Sequence Data 



Shin-ichi Acta, Takashi Gojobori, Fumie Ishibashi, Takeo Maruyama* and Toshimichi ikemura 



National Institute of Genetics, Mishima, 411, Japan 



In 1980 and 1981, Grantham and his colleague (1,2) reported the codon 
usages in a total of 161 protein genes in this Journal, and in 1986 we 
reported those in 1638 genes (3)« In the present work, the codon usages in 
3681 genes are analyzed using the nucleotide sequence data obtained from the 
GenBank Genetic Sequence Data Bank (Release 50.0, 1987)(4). In selecting 
protein coding sequences we relied on the FEATURES tables of the GenBank 
Database, and only complete genes, starting with an initiation codon and 
ending with one of stop codons, were used in the analysis (see ref. 3 for 
details). Table 1 lists the codon use in each of the 3681 genes. The LOCUS 
names given in the GenBank were used for designating individual genes, and 
the SHORT DIRECTORY of the GenBank is presented after the table for defining 
each IXXIUS name. In the GenBank, a group of consecutive genes whose entire 
region had been sequenced were registered under one LOCUS name. To distin- 
guish the different genes belonging to a single UXUS, symbol # followed by 
a number is added after the LOCUS name; the numbers represent the order of 
the peptides registered in the FEATURES of the GenBank. Thus the numbering 
system differs from the previous one (3) in the cases where incomplete 
peptides are registered in the FEATURES. When introns of a gene have not 
been completely sequenced, some of its exons are registered in separate 
entries (LOCUS) in the GenBank. These exons belonging to the same gene but 
having different LOCUS names were combined, and the LOCUS name of the last 
exon followed. by symbol * was given to the gene thus combined. The order of 
the codons in the table is the same as the previous compilation (1-3) . The 
amino acids based on the "universal genetic code" are specified using three 
letter abbreviations, except for the pages listing organella genes. 

To reveal the characteristics of the codon use of individual organisms, 
as well as viruses and organella, the frequency (per one thousand) of codon 
use in each organism for which more than 20 genes are available in Table 1 
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was calculated by adding for each codon (Table 2). The number of genes 
summed for each organisms is given in the row designated as No. GENES, and 
the total codon number thus summed is given at the bottom row. Since the 
codon usage of each organism thus summed has been expressed in frequency per 
one thousand in Table 2, it is easy to compare the codon-choice patterns 
among different organisms. In the previous work (3), we noted that the 
resultant codon-choice patterns among the vertebrates, or at least among the 
mammals, are very similar, although the codon-choices in individual genes of 
one mammal are often very different with each other (e.g., see ref. 5). We 
also mentioned that the codon-choice pattern, that are roughly common among 
the mammals, does not depend on the choice of genes; i.e., when the codon 
frequencies for ten or more genes with varying functions are summed up for 
each mammal, they usually result in a very similar pattern regardless of the 
genes compiled (6). Tables 1 and 2 confirmed the previous notion. 
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HIMFCRH HUMWFERRrrWH CHAM UmA. COMPLETE COS. TMP 
HUMFCRHCa HUMANFCRRrTNhCAVY<C>4AMGeCEXCNBr9AN0 4,*weP 
HUMFERL HUMAN FERRTTML CHAM MRNA. COMPLCTC COS. I21BP 
HUMFCRLS HUMAN FERRmNLKMISUftUNrTUmA, COMPLCTC COS. TZSP 
HUMRX HUMAN FACTOR tXfCHRSTMAa FACTOR} MRNA, COMFLETECOOMaSEOUENCE 
HUMRXA HUMWFACT0RIX(CHRtSTMA3FACTOR)MRNA.C0MR.ETEC0OM0SEOU&Kfi. 
HUnxO HUMN4 FACTOR « CM. COMPLETE COO SMMBP 
HUMRXQ* HlMMlFJCT0A0IOCN6.EXONS7AND».3ltlV 

Hi^tfOL* HUMAN pMYDAOFOLATC REDiCTASC OeC. KXCN t AND T RJIMC. SM40P 
HUMA0AM2 HUMW AOeWBIC OEMMMASCyRNA. COMPlfTE COOMO SCQUBCC. ICMBP HUMFOIM HUMAN OMVOROFOLATE RCDUCTASC OCNC cDlTR). EXON a JTaAP 
HUMADHU HUyUMAOHtunNAENCOOMaALCOHOlDCHVDR008>WftCaAft8IM.PHAt4(teP HUMFCLMC8 MAUN OMVOROm^TI REDUCTASE oe«. 9neP 
HUMADHM HUM«LIVMM.C0Ma0eHirW*0O©4A6«l*0H) ALPHA SUBUMTMIMA.I450eP HUMFOLPOf HUMAN OtHYD«3«3tATC RE OUCTASC PSEUOOODCfPSI- HOI) ••40P 
HUMAOHBB HUM»I AOW 0O« BCOOMO CLASS ( AtCOMa 0CHV0R00O*ASC BfTA-t TOOSP HUMfOS W*IAN POS PR0TaONCO0C«(C«6t. COMPLCTC COS, 41 
HUMADHBR HUMW UWA fVR M.COHOL OCMVOnOOENASC UTA t SUOUNr 2U2BP HUMFTRRA HUMAN TRWSFCRRtN RECEPTOR MRNA, COMPLETE CDS, WIOOP 

HUUADHIO HUUJWAO»aunNAB«X>OMa ALCOHOL OCHYDAOOeuSE CLASS I aAM4A14MaPHUktfVll HUMAN FACTOR VI < PRECURSOR OF SCRMC PROTEASE) UWA. COMPLCTC CDS. 
HUMAFH HLUAN APOFERfimN (HCHAM) MRNA. WIBP HUMFVIi MMAN COAGULATION FACTOR VlIC (AMTVWa»PM|.IC FACTOR) MRNA 

HUUAOO HUMWJWKWOOOmOCNE. COMPLCTC COS. WO THREE ALU RCPCTTT1VE MMFVItt HUMAN FACTOR Vtl MRNA MfTSP 

HUMAOPIA HUMWALPM-t ACnOLVCOPAOrCM MRNA. COMPLCTC COS TSnP HUMFVIIC HUMAN COAOULATWN FACTOR VII £ MIMA. CCMPLCTC COS, «0)MP 

HUMM DaTAAMMOUVULNATC OCMVDRATASC MRMA, COMPLCTC COS. 1 1 SaSP HUMFXS HUMAN FACTOR X (BLOOO CQAOULATION FACTOR ODC. EXONt jUtt P 
HUMAN ALSUMMMRN^ COMPLETE CDS. •14eP * 



PRMAATES 

CHPWAIM APC(CHMPANZEOM.PHA-tA0SN MRNA U IBP 
CHPWA3M APC(CHB»mZB)M.PHA4-0LC»MUflNA|4«P 
OBNJ APE(aaa0N>NTCRLEUKMXMmA.730SP 

OSULTR APS (aagON)Mrcn.ElJKMt MRNA CONTAMMONSCRTEDLTRIMtBP 
OWU O)0Q0NNTBUUKH3a-3)URNA,COMarTECO6,t7«ap 
HUMAIACM HUMAN ALPHA 1 AKnCKYMOTFYTM* COMPL ETC Oe«. MRNA l&aC«P 
HUMAIATM HUUWAL^HA-l-ANTmiVP8NMnNA,C0MPLETCCDS, tX2BP 
MMAIATP HUMAN ALPHA-l-ANTTTRVPSMOM IS VAAtANT). COMPLETE COO, 1 222SP 
HUMAIATR HUMANALPMA-1-ANTTmV?SMMmA.COMPLrTEC0e.1»4«eP 
HUMAIATZ HUMM4Z TYPE ALPHA IWWTTTRmMOBIE. COMPLETE COS (EXONS 14). 
HUHIA3M HUMM ALPHA^-UACROOLCBULH MRNA, COMFUtTE COO. 447TBP 
mMATPI HUMWALPHAA-TMOLPnOTtVIASCMHWTOR MRNA. COMPLETE COOMO 
HUUACCVBA HUMAN CYTOnJUWCBCTA-ACTMOBiC. COMPLETE COS MSTftP 
HUMACCYBB HUHW CYTCPUttMC BETA ACTM Oe«. COMPLETE CDS, M4«P 
HUMACHRA7 HUMM4ACCTYLCHOLMSnCCCPTORALPMASUOlMrrO£NS,EXON»PtANDP9. 
HUUACHRO* HUMM4 AC FTVLCHaMS RECEPTOR aAMUSUttUNn-aB4E, CXON 1 2. 703BP 
HUMACTASK HUMAN ADULT BKB.ETAL WUSaS ALPHA 4CTM MRNA. 13 WBP 
HUUACTCA4 HUHM4ALPHACAR0IACACTM0eC. CXON* AND TFLAM(.r«ieP 
HUMADA HUMW AOBCSMi DCMCMASE MMA, COMPLETE COS. I4TMP 
HUUAOAtt HUMANAI)£N0SMCDCAMNA8S0OC,EX0N12.32G8P 

HUMW ADBMSMC DCMMMASE OeC.C0Mn.ETC CDS. 3IT41BP 



HUMW MAUMM MRNA, COMPLETE CDS. tlSBP 

I HUMM4SCRUM MAUMM MRNA. COMaCTE COS tK«BP 

HUUUAF4 HUMN4M.PHAFET0PncrrtM (AFP) MRNA. COMPLETE COS. MOaBP 
HUMM.Ba HUMAN SERUM PRCALBUMMOENC. 7*1 IBP 
HUMALDOC HUMM4 SCRUM ALBUM OCNCCOMFIETC CDS. IMOtSP 
HUMALOB HUUW M.DOUSi B COMPLETE COOMO RCOON. MWA UttBP 
HUMALDOX HUMANALDQLASSBMmA.COMPLCTCCOS,ieseaP 
HUMALOOeR HUMM MRNA FOR M.00LASEB.1 MOP 

HiMN.PL HUMW UVCR40WKDNCY TYPC ALKALNC PHOSPHATASE MRNA, COMPLETE 

HUMALPPB HUMMPlACC)>frM.«JCMJ4£ PHOSPHATASE TYPC 3 MRNA 27MB P 

HUMMYAP MJMWPANCRiAT1CM.PHA4MYLASCWWAieMBP 

HUMAMVAS HUMMSM.IVAnnraUW«M.PHA M«yLA8CMRNA imSP 

HUMANFA HUhUNPNOOCNteiCOWNOATIMLHATRWRCTIC FACTOR. COMPLETE COS 

HUMANO HUMAN MaiOTMM00e4timA.C0MnCTE COS. 2mQP 

MMAPOASt HUMAN APaS>OPnOFTEN AD ae«. COMPLETE CDS. ZtnOP 

HUMAPOAll HUMW APaVOPROrEN Al AND C M OOCS. COMnCTE COS, D37BP 

HUMAPOAI HUMAN APaiPOPRCrTEMOeCA-n ON CHROMOSOME 1.2WIBP 

HUMAPOaa HUMAN APaPOPfWTEMB- 100 MRNA. STARTMO AT CXON 3. 13i7JBP 

HUMAPOQA HUUW APaVOPnOTCMB-IW MRNA. COMPLETE COS. t4CrR«P 

HUMAPOee HlAUV«APOLPOPnOTENB-100MRNA.COMPLETCCOS.13M3eP 

HUMAPOC2 HUMM4APOCIOB«B4COOMOAPOLIPOPROnEMCII. COMPLETE COS. 

HUMAPOCI HUMJWAPOUPOPROTEMC-tMRNAAIttP 

HUMAPOCI HUMM4APOUPOPR0TEN&tiaeCCOMnETCCDS.4340eP 

HUMAPOO HUMAN APOUPOPnOTEf*0MnNA,COMPl£TE COS. WSOP 

HIMAPOG3 HUMANAPOLPOPnaTEHE<£PS«,€N2AN03ALLaES}MRNAn6«BP 

HUMAPOe4 HUMAN APai»OPnorTEHC(£PS«.ON4AtiaOOOC. COMPLETE COS. 

HUMAROL HUMM4 LIVER AROMASC MRNA, COMPLCTC COS. U4«BP 

HUMASt HUMAN AAaMMCSUCCMATC SVWTWTASfi O0C. EXONS 10. H. 12 AND 13. 

HUMASA HUMW ARQffMOSUCCNATE SVHTMCTASE MMA, COHPlEH CDS. tUTBP 

HUMASOPRI MUMWASIM.0aLYC0PAOTENRCCEFTORmMfMA.C0MnETBCDS, I777BP 

HUMASGPR2 HUMW AStaOOLYCOPnOTEN RECEPTOR K2 MWA. COMPLETE COS. ISMBP 

HUUASL HUMAN AAOMMOSUCCMATE LYASE MRNA. COMPLETE COS. 1k4tBP 

HUMATM HUMANANTTT>«t0t«WI1Oe«.CX0N0.2«7BP 

HUMATCC02 HUMAN T-CaX SURFACE Mnoe* COS rtl1)MFWA.C0MnCTC COS, 1MMP 
HUMATCTU HUMM TCai SURFACE MHOei T3 OaTA^HAM 09*. CXONB 2.3^ AND 
HUMATCT4 HUMM T-CEU SURFACC OLVCOPRCrTEM T4 MWA. COMnCTE COS , 1 742BP 
HUMATPC HUMW AORrATF CARRER PROTEM MRNA, COMPLETE COS. 1 SaBP 
HUMCL2A HUMM<»«ei.LCU(BIUVLYMPH0MAt{BCL-2)PnorO^ONCOaaCM»MMMaP 
HUMBCL3B HUMM B4»I LCUKBI WLYMPHOMA 2 (9a-2t PROTO^MCOOD* MRNA 01 IBP 
HUWHA HUMM BETA I Circs MMPABC ALPHACHAM MMA. COMnCTE COS, 23t7BP 
HUMLWI HUMM BLVM-1 TRMSFORMMO OB*. COMPLETE COONO REOION. tOOtBP 
HUMClMHA HUMM PLASMA PROTIASE ten MHWron MRNA, COMn.ETE COS. I WtBP 
HUMC1»*« HUUM PLASMA PROTEASE (CI) MHWrORMTMA. COMPLCTC COS. 1M18P 
HUUC3 HUMM C0MaWBfrC0WP0Ne/TC3 MRNA. ALPHA ANO BETA 8UBUNfT8.60«7eP 
HUMCM. HUMM CACHONM, MRNA COMPLCTC COOtlQ SCOUB4CC. 7VIBP 
HUMCALC6C HUMM BCNLUNOCARCMOMACaiLMC MRNA FOR HK}HM(R)CALCrrONM 
HUMCN.CR4 HUMMCM.CfT0M«CALCfT0MNOe<E-RELATEDPEPTnCa8ilC.CX0N4. 
HUMCCK2 HUMM CHaECYSTOKNN OWE. CXON 2. JMBP 

HUMCCRP HUMM CCRULOPLASMMfFERROXIOASEl MRNA. COMPLETE COS. 3UteP 
HUMM CHORIOMC OONAOOTROm (HOO) OeC S, BETA«INUMT. 1 0MBP 
HUMM CHORKMC OONADOTROPM (HCO) Oe« BCTA4UUMn. 1 MiBP 

HUMM CMORKNCOONAO0mOPitl(HCO) BETA SUBUMT MRNA ft3«eP 

HUMCOaaA3 HUMM CHORIOMCaQNADOmonN BETA SUBUNn^Oe«G. CXON 3. CLONE «t4SP 
HUMCOBao HUMM CHORMNC OONAOOT nO P M BETA SUBIMTT OB* . CXON 3. 4 JMP 
HUMCMHP HUMM nASMASCRMEPROTlABE (PROTEM OWWTOR MIMA COHnCTE 
HUMM CREATMSKMAKM MRNA. COMPLETE COS. IMSBP 
HUMM HUMOS OM HOMOLOOOUS TOTRAMFORMNQ OOC OFMUBV, ISOOBP 
HUMM SKM COlArMNAtC IWNA, COHPLCTE COS. tffTOBP 
HUMM BLUE CONS PHOTORSCCPTCNI PraMOfT Oe«. CXON ». 377aP 
HUMCPOO HUMM ORra* CONE PHOTORSC^TCRnOMerrOMCl.ExONraMBP 
HUMCPfM HUUM RSO COM PHOTORECEPTOR POM WrOCNC. CXON O SUBP 
HUMCRF HUMMCOffTVOmORM^VLCASMO FACTOR |CRF) OB«. 2MSBP 
HUMCRPO HUMM C-REACTIVE PROTEM 0O«. COMPLETE COS, M3«P 
MMCRVBi HUMMBCTACRVITAUJNOeC(HUMTAAyAtLCK0NI.4«7BP 
HUMCRY002 HUMM OAAAAACRVVTALIMO OENC. CXON 3l JMBP 
HUMCRVO(M HUMMaMMA-CRVBTALLPMOCNS.fiX0Nl4a3BP 
HUMCRVGXt HUMMaWMACRmAUM14 0M.EX0N3.4HSP 
HUMCAVOXA MMMO«MACRnrtTALLMMOBS.EX0N3.4S3BP 

HUUCSt HUaW CH0RO<C SOMATOIilAIWOrWOPM OENE HCS-1 . COMPlTFE COO. PO ISP 
HUMCSFtMB HlI*MMAC«0PHAO»-BPeCIFCC0L0HV.»TKULATWO FACTOR tC8F-n,t23lOP 



HUMFXI HUMM FACTOR XI (BLOOD COAOUUTION FACTOf^ MfMA, COMPLCTC COS 
HUMFXIiA HUMM nACB4TM.FXI>A MRNA COMn.CTE CDS. SAtttP 
HUUQSPO HUMM aLVCCAM.OCHVDCO^HO$PHATC DCHVOROOeuSC MfMA, COMPLETE CDS 
HUMOJPDA HUMM aLYCCRM.OEHYDC-»-PHOSHATE OEHYD ROOetASE MRNA, COMPLCTC CDS 
MJMOSPOC HUMM OLVCERALDEHVDC 3-PHOSPHATC DEHVDROOeUSE MRNA t3M8P 
HUMOiPDR HUMM MRNA FOR OLUCCSS«^HOSPHATGOCHYDROaEMASfitO«PD).2«>4BP 
HUMAST HUMM OASTRM OOC, COMPLCTC COS, ITttP 
HUMOAST) HUMM OASTRM OBIS. COMnETE COOMO SCOUeCC.1 21 7BP 
HUMOC HUMM OROUP-SPECJFICCOMPONe^VTrAMMMMMa PROTEM MRNA, 
MMOCS HUMM OLUCOCERaROSIOASE MRNA 2227BP 

HUMOCBL HUMM LYSOSOMM. OLUCOCEREBROSOASE MRNA. COMPLETE COS. 1 7t»P 
HUMOCRA HUMMOLUCOCORTVaO RECEPTOR ALPHAMRMLCOMnCTS COS. 47UeP 
HUMOCRB HUMMOLUCOCORTCOtD RECEPTOR B6TAMRNA, CCMPlfrC COS. 37«lBP 
HUMOCSFO HUMM OeiE FOR GRANULOCYTE COL0NY-8TMULATMO FACTOR (O^F). 
HUMOCSFR HUMM MRNA FOR ORMULOC YTE COLONT tTMULATMO FACTOR (0 CSP . 
HUMOCSFRI HUMM MRNA POR ORANULOCYTE COLCNY^TIMULATMO FACTOR iOCSf) MMSP 
HLMOFB HUMM PREPROMBULH-LKE GROWTH FACTOR H tKJF^l) MRNA, COWXTE 
HUUOFIM HUMM MSU.M1KE OAOWTH FACTOR (lOFI) 00*. EXCN 4 OF 4, «7«P 
HUMOFIMS HUMM MULMLKE GROWTH FACTOR (IGF.4WOeCEX0N 4, MWP 
HUMOFS HUMM MSU.MLKE OROWTH FACTOR • (lOF S) CONA TO MRNA tOWSP 
HUMOO HU«MPREPROauCAaCNOe«,CCMPLETf COONO SE«««fi.MSSBP 
HUMOH HUIM OROWTH HORMONE (HON: SOMATOTROPMiaCHC. COMPLETE COS. t»MBP 
HUM0M4 HUMM OAOWTH HOfMONS 0048 (HOH44). COMPLETE COS. 2I67BP 
HUMOHV HUMM OROWTH HORMONE VARlMT (HOH-V) OBC AND aAM(8, 2M0BP 
HUMOLVTRN HUMM (HSP02) OtUCOSC TRANSPORTER Oe4C MRNA COMPLETE COS. 2BS4BP 
HUM0LYCA4 HUMMOLYCOPflGTEM.ALPHA-SUeUNrrOB«;eXON4M0R>M(S,3«7BP 
HUMOLYPL HUMMUVCRaLYCO0e4PH0SPH0RYLASe MfMA, COMPLETE COS, 2«1«BP 
HUM0P3A HUUM04OOTMCLIN.MeSRANEaLYCOPROTEMI»A(GPItA)MmA3inBP 
HUHORFPS HUMM OROWTH HORMONE -RaCASMO FACTOR (GRF) GB4E. EXON (, 2MBP 
HUHORPSC HUMM GASTRMAaCASMO PEPTIDE PARTUL »- REOKM. 7V7BP 

HUMM M.PHA OLOSM GB4C auSTER ON CHROMOSOME 1«: ZETA OENE. 2WSaP 
HUMM ALPHA QLOBM REOION ON CHROMOSCMi IB: PSt^ALPHA-1 . ALPHA 2 

HUMMBCTAaOBMR£O»0N0NCHR0M06OMCll.733MeP 

HUMHB10Q HUMMOeCFORH0TCN6H)tO).ll1OBP 
HUMHUDZA HUMM»CACLASSnALPHACMAMOe4EOZ-ALPHAM9l8P 
HUMMOOBR HUMM MRNA FOR HLAO CLASS II ANTIOEN DO BETA CHAM. 1 323SP 
HUWCDPVW HUMM MRNA FOR HLAX» CLASS ■ANnOe4DP1« BETA CHAM. tOOIBP 
HUMHLDOIMB HUMMMRNA FOR H>-0 CLASS i ANTWet OOWI.I BETA CHAM. I21IBP 
HUMORtB HUMMMflNAFOR»UOCLASSIMT)Oe4DRlBETACHAM.M«2BP 
HUM40t4 HlMMNOPI-ltSTOM CHROMOSOMAL PROTEM »iM-t4HmA,C0MnETE CDS, 
HUM»M0t7 HUMM NCN-MSTONE CHROMOSOMAL PROTEM MM- 1 7 MRNA COMPLETE CDS 
HUMHMOCOA HUMM S-HVOnoXYOMETHVLOLI/rARVL COBIZYIK A REDUCTASE MRNA 2»04aP 
HUMHPAtB HUMM HAPTOOLOSN «.PH^IS)«CTA PRECURSOR. MRNA. t2»4BP 
HUMHPAtS HUMMHAPTO0L0BNALPHA.IS(HPA-IS)MRNA.C0Mn.CTCC0S.I>2»P 
HUMHPA2B HUUIM HAPTOOLOSN ALPHA(aFS]«CTA PRECURSOR. HMA. 1411 BP 
HUUHPARSI HUMM HAPTOOLOBM GENE (ALPHA2AUaO.C0MncrE CDS AND tmiBP 
HUMHPRT HUMMHYPOXMTHMCPHOSPHORSOSVLTHANSFERASEO*^ MRNA 133 IBP 
HUMFNIiK HLMM MTERFERONWOUCED 1 H(OA PROTEM MfMA. COMPLCTC COS 43S6P 
HUMiniAOl HUMM MTERFERON ALPHA oecm-ALPHAICCOMnETE CDS, itUBP 
HUMIfNAOa HUMM MTERFERON N.PHA Oe« <FHALPHA 7. t««9ftP 
HUMIFNA04 HUMM MTERFERON ALPHA Ge«*N.ALPHA a. aOZ2BP 
HUMffNA20 HUMML>MPHOCYTEPRCirTEflFGRONALPHATVPC2»l.1X3WP 
HUMMAA HUMMLCU(OCYTEMTCRFCRONMIF)*PHAAOO«.I7M8P 
HUMFNAB HUMMLCUKOCYTE MTERFEnCNtlFN^HA) ALPHAS MRNA IMtBP 
HUMIFNAC HUMM LCUKOCYrCMTCfVCACMim-ALPHAtALPHAO MRNA MOBP 
HUHIFNAO HUMM LEUKOCYTE MTERFERONtfHALPHAIALPHAOOeiC.1 1 71BP 
HUMIFNAF HUMM LCUKOCYTCMTERFCRONdFN^PHA) ALPHAS MRNA OMBP 
HUMIRMFM HUMM LCUKOCYTICWTCRFER0NALPHAF(m^iH.PHAR MRNA, COMnETE 
HUMfNAOS HUMMLEU(0CYTEWTERFCR0N(1fHALPm).J.t«MOeiCS.tWTBP 
HUMIFNW HUMM LEUKOCYTE MTERFERONIVHM.PHAIALPHA'H MRNA MSBP 
HUMmwO HUMMLCLK0CYTENTERFER0Nt»N ALPHA)ALPHA H20M.l«1ttP 
HUMFNAI HUMMLCUKOCVrC HTERFEflON (VM- ALPHA) ALPHAl0e«.lt73BP 
HUMHHM HUMM LCUKOCVnCNTERfCRCNALPHANOnMLPHAN) MfMA. COMPLCn 
NUMMATA HUMM ifTERFERON ALPHA OeCJFN-ALPHAI A COMPLCTC COS. 1 1 aiBP 
HUMOTMTB KUMMNTCRFCRONM.PHAOe^»N-«.PHA •.COMPLETE CDS. tt*4BP 
HUMFNATC HUMM MTERFE AON ALPHA OBfC IR4-N.PHA ». COMPLETE COS. t47«BP 
HUMIFNATO HUMM PfTERFCAON ALPHA oe4Sn«^PHA«,aaBP 

HUWfNAWA HUMMMrCflFEnON^N.PHA-WAGENE.COMPLETECOOMaSCOUeC8.uaP 
HUUR«tF HUMMFWKmA8TNTERFERONtVN«CTA1)Oe4E«;ORJimS,IUWP 
WMFttt* HUMM BCTAIMTCRFERON MfMA. COMPLETE CDS. nZWP 
HUMFNO HIMM ftMlPiC POERFCRON { FW OAAHA) OM AND RMKS. MI8P 



WMWOM ►OAMTiaiaiAHUjOCYTEAIACROI^ HU*M IG ACTWC HCAW CHAM CPSICM-I OWI.CONfTAW RCGW It^ 

"^""•^ 1«.«miULATW0F*CTD«(»«JM««F) MIAM.1 HUMANMONOCYTl MTtRLSlKN I (I t) l»4A. COiWLBTECM. MMSP 



HUMCSFOMA HUMMOMMULOCVTC4AACR0PHAOfiC0LONY4mMULATwaF 
HUMCTHO HUMM CAT>CPSMDMRta,COMn.CTE COS, 20MBP 
HUMCYP14S HUMMCYTOCHH0MSP-t-4SO(TC0D-MDUCaLE>MnNA,C0MnCTECDS 
HUMCYMM HUMMOe4CP0RCYTOCHRaMfiP(IHM,M2MP 

HUMCVWA7 HUMMCYTOCHR0MiP-»S04OCNB.EWNT.ANOTHReCAUJRePCATSM 
HUMCYK;i7 HUMMCYT0CMR0MBR4i0Cl7(8TERO«)17,ALPHAK«)ROXYLAS*l7.l0173«P 



HUMLtSR WMMM(MAFOR»fTEn.CUKMlBCTA tIMBP 
LIU t» HUMMNTEf«.CUKM^t (L I)l«MACOMnETEC08, VOW 

HUMM MTEn.CU(M2 (I. -2) 0048. C0MPLET1 COOMO SCOUeCC. B737BP 
HUMM WrCn.CUKM 2 (L-2) Oe«. COMnCTE COS. AND FUM(P«0 MMBP 

HUMM MTEfl.CUKM 2 (L 2] Oa«. COMnCTE, WtSP 

HUHJfH HUMM PnEflLCUKH-2 RECEPTOR G0K CXON 0 MO fVM4(S. MMBP 

iw^mo iiiiiic^'ikMmPMoHm^ humwra •Sl'S^J^L^l^c^'lS!^^*!^ 

HUMCYFSCC HUy|MCH0LCITER0LS»«HA«aEAVAO8OirfMCWS0CI«IA,CO»»LETEHUM«jRX» **f^*n^^^*^*,^^^^^°^i^J^**^ 
MMW HUMM SERUM VTTAMMO^MOMO PROTEM 0«BP>MfMA.COMnBT8 COS. WmJM ****tTJ!^^J^3S!t S2L*,iiTi^ 

HUMECOF* HUMM BCTA0IOOTWLIAL COL OROWTH FACTOR (EeOF«TA»MIMA««P l*»«J ****** ^gg*!^?^ ? YS*^ 
HUtfCFlAR HUMM MRNA POR aONOATION FACTOR I ALPHA SUBUNfTlCF-l ALPHA). MMU J^^^J^^ff^^^^^i^Jf^ 

NUMEOFRN HU*M EPWERMAL OROWTH FACTOR RECEPTOR MRMA, COMPLETE CDS. »3a8P >*»^^ 

HUWMFR8 HIMM ABCf»ANT(8H0flnCP«iRMAL OROWTH FACTOR BCCEFTOR MRNA. ™- « .*«oo 



HUMMHA WMM P4MBMASUBUP4rT MfMA, COMPLETE COS. I33AP 
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HUh»«1 HUhl>MN&U.NOe«MCt.U0MarAM>yFUMCS.4«44aP HUUStSM HUMUMC^ MM BCOOWPUmrr OCRnCOOnOltfm FACTOR BCMAN 

HUMMSPn HUMWN^HA-1^ieU.N0B«M«}S*aJM(NaP0LYM(X«>HCReaK^ HUtOtSPtJO HUUiMC«S^FUmET-OCf«VIK>OfWWrN FACTOR t(8t»roOF9yf*M. 

HUMMftfl HkMWI«ULMRCCCPTtMIMNA.COyn.CTIC(».4721gP HUMBOO HUUN CU2N ftUPEROXBC OtSUUTAftS (SOO) COM FUTI COS. 1746? 

HuuMSfu HUhM4»MULNnEccFnmHmA.ooMnj;Ticoe.«imp wmsooi HUUMSupfRoaaosnaM/rAUfSO&iiMnNi^cciwLmcos.HOBP 

HUMKTia HlM«NMr-1UMUARVONCOOOC.4U3BP HUMSOOW HUUW SU^flOXDC DOUVrASC (BOO-I) OOC. EXON » iM) RJV«8. WOSP 

WMOOa MMWISa«l(0e«{inS«VO4ITMUUTE0aBC)BC00t4aAMK0A MUMOM HUUM4SOIlArOITATWIOEMAM)ajlNKft.aM7B» 

HUhKCRSP HtWM«>KDTVPtlCPI0CT*W. KgWTNOa«.COMW.ITtCDt.MMBP HUUTBM HUUM BCTA>T\»U.M OOS (»WA) MfTXTBI N.U FAMLYUOBCnS. 

HUyK£llEP* MUW»<MQ(CPIDCPMM.>CBWTtl.TVPt ■)00«.EtONriOWP HUWTBOUM HUUANBCTA'TUBULH OBtC. CLONE Ii4a II ITBP 

HLMCMIO HUiWI(IMC»0<OB«.CXONlOiQCOOWaBRAOVKMMN«>CXONn. HUMT90 HUMAN TMVnOX»C«»OMaQLCKJL»4Mm^,COMn.ffTV COS. ItTW 

HLAUtfTTAt HUMAN PRS-ALPMMJCTALSUm MA. TDIBf HUWTCAM HIMAN T«aX RCCOTOR ACTIVE ALMAhCHAMMRNA FROM JM CSX tMC, 

HUMJMC HUMAN LNUMCHRNA,C0MR.CTEC0S.1WW HUWTCtXA HUMAN T-COL flECCPTOR ACTIVE KTA^MAK MRNA PROM C£U IMS 1 1 tt BP 

HUU.CAT HUMWLECmwCHOLESTEnOLACYlTRAMFEnAaEHmA.COMR.ni CDS. HUWTFRR HUMAN TRINSFERnMRCCEmiRMnH^OOMPlETE COS. ttMS^ 

HUICDHA HUMW LACTATE DPIVPftOOEN M E A »OgYMi MRNA. COfclETECPS. HUMTOFAM HUMAN (CBl LM t027 PIT) TnAMP0fWNaanOWmFACraVALPHAM^N^ 

HLU.DHAT HUMN4LACrATCDem)HOaeMSfiWkO8«,EX0N7.«9P HUMTOn HUMAN T lV >NB fO«i y»IOOWOWfTH FACTOR ■SCTA(TOF«TA|MHMA.COMR.ET^ 

HUU.DLR1A HUMmLOWOe«fTYLVCraOTSMflEC»TORafiNE.EXONll»Mer HUWTHVMA HUMANH«mmi08MALPHAMR»M.C0MrUTE COS. IIMftP 

HUU>« HUMM4 LVTCMZMO HORMONE (mOCNl.KTA4UBUMT.IMW HUUTHYWAA HUWNfROTHVMOSM ALPHA UFM^ COMPLETE COS. 4MSP 

HUtCHRH HUMAN LUTENaNOHOfMGNEnaEASMO HORMONE (LHRH»HnNA.COMrLETE HUWmVB HUMAN THVMCVLATE 8VNTHASC WMA. t«MSP 

HUM.T HUMMLVUPHaTOKMMmA.COMfUnC0S.13MP HUWTm HUMAN MMA FOR TISSUE SMSTTOR Of METMlOmCfTOtASES (TWP). TUSP 

HUUAAS HUMMHASPROTaCNCOOeSMmA.C0MPl£TSC0S.13M0P HUUTK HUMAN THVWKME KHME MMA, COWLFTE COS. IttlBP 

HUMIER HUMMI«TAUOn«ON0MIOe«S<T-f)ANOR>M(8.inaSP HUMTKRA HUMAN IHYWfWS MHASE OM. CCMUTE COS. WITH CLUSTCRB) M.U 

HUMMEnra HUMAN l«TAlL0rTHa<IUPBEUDOOeCfyrT.IPS».«0TBP HUUTNFA HUMAN TUAORNECROSn FACTOR QOCCOMPLETf COS. JUatT 

HUMETU HUMWMETi41CrmiONeN-t-AaB«.CCMPLETECOOSiaSEOUBICE.2MlSP HUUTNf* HUMAN LYMPHOTOKM (TNF4ETA) OM. COMPLETE COS. KDTSP 

HUMIETV HUMAN METAUOTMamt^ OWE <Wr-O.I7nSr HUHfPA HUMAN TWSOE Pi A S W OMH ACTTVATOR (T^fi 00«. COMPLETE CDS. 

HUMETF MUMMIMCTALaT>«0NEM4FOe«C>*rTF).ftasaP HUMTPAU MMN«TnSUE-TVPEPLittMilOO» WTWATOR(TPA|0eC.EX0NU. tnSP 

HUWCTV1 HUMMUETMXar>SOI»l»fT)I^Oe<. COMPLETE COS. t07«P HUUT?AR HUMAN TISSUt-TVPE PLASWIOaa ACTIVATOR (T^A>HmAL«CKNO FMOCR 

HUNHOl HUMWMVOaL0eMaB«.C0MFUTEC0B.CX0N3, 1>2«P HUUTP1 MJMN TnOSVHOSPHATE •OUEHASfi URN<COMR.ETE COS, ISMP 

HUbMH HUMAN CLASS I TnM«PLINTAT10NAMnaB4(»CA)G8C4lX)aP HLftimOPA HUWN rBO OSL AS T MUSCLE-TYPE TROPOWTOSJW MWW. COMPifTE COS. lOMBP 

HUMMHAI HUUW MHO CLASS I HAAS OeCMOOO^ HUUTBH3 HUMAN THVRUTRCPM (THYROO STMULATNO HORMONE) SETA SUBUMT OM. 

HUUMHAS HUMWMHCCLAaSIMJMOBCSnnP MHTUBAO MMANALPHA-TUBULNO0C<B-«.PHA-1).CaUPLOECOS.4M7tP 

HUMIHCN2 HUMMSTEROOt1-HYDROXVLA8E(P-4M(C2l}lBOBC, COMPUTE COS HUUTVIBAK HUhlW ALPHA-TUSaN (FROM KERATMOCVTI CEUft) MRNA 1 MSP 

MAWHCPS2 HUMAN8TEROCtl-HyDnOXVLA8CBO0C.COI«>LCTECOS.att29P HUfTUSBM HUMAN BETA-TiaULH0OS.C0t»L£TE CDS. UMSP 

HUMIHCW9 HUMM«MHCCLASSI>1ACWI0BC.3T14BP HUMUKUl MMAN PRE PROUMW MASS IMA, COMPLETE CDS. I «7W> 

HUMUHOCtA HUMiMUHCCLA88ltMAOC1-N.PHAOe«(0RIM.lM),tm^11)MP WASMPffi HUMAN PREPROUnOKMASfi MIMA. IMOSP 

HUMriHOCae HUMMWHCCLAS8flWA-DC^»«ETAOB«(Oft3A«0MftP HUMUPAX HUUJM UT A OCNE FOR UROKMABC-FUSMHOOa ACTTVATOR 7M8P 

HUMIHOCMI HUMW* CLASS I HBTOCOMPATISLrTYANnaei DC-ALPHA CmMMRNA. HUMUROO HUMAN UROPOR^HVWIOQei DCCAnKMCYLASfi MRNA. COMPLETE COS. 1 1 fTfiP 

HUMytHDCS HUMMUMC CLASS ■»CA^'BETAaOS(DWI«ft«) AND aAWS.TTTttP HUWVM HUMAN VSOfTW 0018. COMPLETE COS. IT4«P 

HUMMMe HUUM UHC CLASS HHU ORB OA«ETACHAMMmA, COMPETE COS. 1M7BP HUMM HUMAN VASOACTIVE HTESTMtt. PaYPEFTOConP) 08IE. EICON*, ISSP 

HUMMORA HUMAN HLA-DRAWTOO* ALPHA C»WWMmAtrVSFRAflMP»TS-llHSP HUMVWHW HUMAN VASOACTnff HTESrVM. POTIDE AND MSTDMB-HETMONtS lOSSP 

HUMIHORW HUMAN HJtOR M.PHAOHAM MMA ttnSP HUUVIPMRI HUMAN VASOMCTIVE NTaTMN. POLYKPTBC (VIP) Oe«. EXON 4. n4SP 

HUMIHD«A HUWNimC CLASS 11 OR«rrA«HAMMmA(OR:i.WI), CLONE SETA-4. HUWVPW HUMAN Pn^ROS^MOSWE VASOneSStMSUnomVBM ■ OBC. COMPLETE 

HUMIHOme HUMAN MHC CLASS ■OR«ETACHAMMmA(ORa.Wt), CLONE BCTA^. HUMVPNPA HUMAN VASCPREBSM-WUROrHrSM I URNI^ COMR.ETE COS. MlftP 

HUMIHORBC HUMHMC»UCLAS8IOR«TA-1(IMn4)HmA.1UtBr HUWVTMSP HUAAN VTmONECTM (S^ROT^ IMHA. COMPLETE COS. lUSf 

HUWIHOnC MMWIHJ^ORilWnoaiBETA^WmAnnBP lAKMS UaW(SVSI.FASCCUUMS)PREPR0MSULMMmAM3S» 

HUMyWDRMA KUMMMHCCLAS8ltMroeiHLM)RALPHAHEAWCHAM.COMPLErEC08. IMtMETI MOMUiY MET ALLOHfCMEM I (WT^ IMWA. 393*^ 

HUMMHDR83 HUMAN HU<OR ALPHAOUW |CHAM^> OM. CXCNS 2. 3. 4 AND ». IMOIETI WOMtEYMETAaOTMONEM n(MTT4 hSMLSaS^ 

HUMynCSA HUMAN HAM ALPHACHAMMmA WISP 0RA»«A01 ORANOUTAN ADUT ALPHA4-OL08M OSS. WSP 

HUMIIOXAt HUMM4MHCCLASSHWAOX-«LPmOOCtDA*,«W»,CXON4.0»P ORAWAflS OAANOVTAN ADULT AmtA-l OLORM OM. tMSP 

HUMIHQM HUUMUAXRMSTOC0iuPATm.rTVCLAS8l4MnaaaAMMACMAM.URNA 

HUMItt HUMM MUaiERlM MBTTMO SUtSTANCE ODC, COAVLETE COS. StOOBP »*»** ROOOfTS 

HUMIYCaU HUMAN{BLI3)TRW«SLOCATH>T(t:l4)&4nC0NCOaM.eX0N3AN0lMI»P OMCASA OUS^A POCASEM A WMA. CCMFlETt COS. tOMttP 

HUWrrCC HUMM(L*WN)C4IVC PR0r»0NC00 O« .C0l*LETEC00H0SC0UeCEADS2BP OPfMS OUt«APiaPREPnOMSULNOBC.VAM)rnjM(8.MTaP 

MMIVCOT HUMM (OAOOQ TRMSLQCATEO lytM} CM1C CNCOOENE WmA. CCMPLCTE OPUCTAL aUMEAPIO PRE^LPHA-LACTAtSUMMURNA. TWST 

HUMIYCa HUMm(ADMM)C4«VCPROr»ONCOOOC.EXONSANDFLAMCS.C7l0^ HNuBESa HMMTEA DE8UN OME ENOOOMO THE MTEflMEOUTI FLAMWT tSAMP 

HUMIYCFS HUMAN FETAL LIVER C^rVCPnor&ONCOOaCEXCN} AND rUMtS.I0O*SP HMttHRM CHNCSE HMISTER DIHVDROFCLATE WDUCTA8Eae«CaONE AMS. tUTBP 

HUMIVC03 HUMAN (a»4OEfM.MECAfVCPR0rrO<MCO0e«,EX0NSN«)3rRM( HMOHRir CHMESfi HAA0TER 0IHV0n0fO>TlRE0UCTASE0e4E.aCNE U01M7. WlBP 

HUMAYCM HUMM(KH2)C4IVCPROr»ONCOaENB.UnNA.t»lSP MMOT HMMTER B.ONaATKM FACTOR S MIMA, COMPLETE COS. tSKlQP 

HUU4VCPC* HUWW04«VCM iySMA,SSTUTWa FROM PROMOTER P0.(ICWVC3.1) HMIOO BYWANWMSTBS PR EP ROOLUCAOONMRW. COMPLETE CPS. II If 

HUMIVCRT HUMW (RAJ) TRANSLOCATED Ti(«l4)C«IVCaN000OK, COMPETE COS. H4AI0SR CWCSE MMSTERURNA POROLUTAUta SYNHCTASE 14rBF 

HUMIYP HUMAN IfVGLOPCTOXIMSfiimA. COMPLETE COS. ttMS^ HNMA0t4 HAAWT^ »4fVDR0MV44CT>nLaLUTARVL COeOVUS A (MM COA) MMSP 

HUIMX HLMANNEUROLEUKN MNA, COMPLETE COS. 1MW H AASS4QCU SVRWMWAaTetCYTOPLASMCWfVDnaXV-MffTKaOLIjrAMVLCOEKrMEA 

HUhMPV HUMANWUR0r^n0fiY9rV)PRCCUnBORURNA.HtSP HMSMT CMNESfi HAAMTTBI WRTURNA. 13a9P 

HUWAASR HUMMNAASURNAM«aAM(MaREaiONS.»ia«P HMMETI CHMESfi HNMSTBI UCTALiOTMOCM I (tfT* MRNA. »0«SP 

HUMOAS07 HUMAN(r-<^aOOABVMTHETASfiB0B«.EX0N7AN0FLAM(S.430gP HMMETI CHMESfi HNMBTBI METALLOmiONEM 11 #rT-4 URNA. SatSP 

HUMOAT HUMAN OmrrWNE AAMOrT TUWfOVm IWMA. COMRJTE COS. tPUSP HMtTRP BVRUINO0LOHNy«TERSCRAPIE(PRI0H PfWTEMPRR nXMRNA. lAllSP 

HUM0P6 HUMAN OPSHOOCCOMPLCn CDS. MAW IIAA» n p | HAM8TBI (SvnHN OOLOf PRP OB4S. eOMPLETE COS. MOW 

HUMOrC HUIM0mrT>mETIWaCAf«WIVLA8E(OrC)URNA.C0B*UTEC00M0l4MaP HMPRPHM OOLOOI SYRIAN HAUSTn PRP OM »Q», BCOOSO ACtOC PfCLME-RICH 

MMOTNPI HUUMPR&ROOmOC»MSUnOPmSMIOBC,COMFUTECOS.maP HIMRPSM CHMGBE HMVTBt OVARY (CHO( RVOSOWL PROTEMSU WML COUFLETE 

HUMna HUUANHA^MTX»fAS80CUTn»fVARUW(rCI4AN(PaatMnNAt3(MaP WMMPStr O ■ HE I lAAW H I I (C OflttEUS) RBW e C MAL PROTEM RPSt 7 IW<A. COMR.Cn 

HUUPUII HUUANCaLULARPHOSPHOPnomMPS»aB>lfi.EXON1l.1|MSP mtJKT CHMaEHAUSTmTHVM0MEKNASEae4S.EX0Nr.SHSP 

WWPSX KUiUNPUCaiULARTUKnANTia»MflNA.C0MaFrEC0S.t3ITBP HAMTISAA CHS«8fi HAMSnnN.PHA-TlAULMII«NA,COMFLEn COS. 1«2«P 

HUMar WMANP«S CBJLOAR TUMOR ANnOCNMRMLCQMR^ COS. l7*0ttP HAAfTUSAS C»W6SE HAA«nR ALPIW-TUBULM I URML COMRjBn COS. t«S4aP 

HUWtn HUMAN UCANOUA ASSOCWTED AWTIOEN Pf 7 (MaANOr m AN Simn s O MmA. HAAITUSAC C HMOS MMSTBI ALPIM-TUOULMM UnNA.CCMUn COS. tlTMP 

HUMTAIt HUMAN PLACENTAL PLMUMOa» ACTIVATOR MMMTORU(»U.COMPLEn COS. WAMM7 WtlSnR VSmTMtNTEMEOUTE FUtBIT) OBC: EXONA JWOFUMKS. 

HUMPA8* HUMM«aLVCOPIIORMC(PAS^HRNA.COHPLEnC0«.anM MUSA C ASA MOUSE iKacrALALPHA-ACTMa0«B.COMPLEn COS. AOOW 

HlMPOKl HUMMPHOSPHOaLYCeWEKMASE(PaK)MRNiLBXONBtTOtMT.tt33BP MUSAC»WM MOUSE WMA FOR NEURN.NCOTMB ACETYLCHOUNE RECEPTOR ALPHA ItUfiP 

HUUPOKAII HUMAN XAMCa>PHOSPHOaLVCCHAnK»USSOENB,EX0Ht1.M6SP M USAC UR i MOUSE MCOTSSC ACETYLCHOLMB RECEPTOR SETA SUMMIT UfMA. tOTttP 

HUWIM HUUANPHOIVLALMMEHVmaxnASfi MIMA. COMFUn COS. SUMP UUBACWO UOUSS ACETYLCMOLM RECmOR DCLTASUSCMrT URHA.COMPLEn COS. 

HUMPM HUMANKTATVPE PROTEM KNABSC MM. COMRSn COS. M6B8P MUSACtM MOUSE BKaCTAL MUSCU ACTWimA,COWlEn COS. 14 118P 

HUMPLA HUMAN aACCMTALL«TOOENHORMmE:WL<aaB«E AND FUtfS(S.tW7BP UUSACT1R MOUSE CVTOSKaETN. MM FOR SETAWTM IMSf 

HUMPLS HUMAN PLACafTALLACTOOSN HORMONE: HFL-4URNA.7HV MUSADAM UOUSS ADOMBMl OSAUNASC MRMA, COMPLin COS. tsnSP 

HUriPNU HUMAN PURMNUCLEOSnSPN0S»4ORVLASfi{PNr)MRWL00MPLEn COS. UUSAFF MOUSE ALPHAFETOmOTEN (AF^ UISM A OM FUMtS. . I07«IP 

HUMWSM HUMAN PURS«NUCL£OSJDEPMOSPHORYLASEO0«.EXON».31»P UUSALBR MOUSE M«iA ROR PREALBUMM *14Br 

HUMPOLS MSMM POLVMERASfi BETA URNA.COUPLEn COS. ttnS^ UUBAAmu MOUBEALMA^SrVLASE-l MRNA WTTHOUTLEAOCR. 1A1ISP 

HUffOUC HUMAN PROOMOI&JINOOORTMtPOMOOBS.COMPLEn COS. lAMSr UUSAAIYW UOUSE ALPHAAWLASfi-IOfiNE: MNCRBATKMNA. IfTMP 

HUihrOMCt HUMAN PROOMOMELANOOORTM{POUOO0«.EXON».tXMSP UUSANF MOUSE PNOOM MCCOSO ATRUL MATRRJRCTW FACTOR COMPL En COS. 

HUIVPT HlSAANPANCflEATKP0LyP9T10E(Pf)MDC0SJ»^n>E PRECURSOR URNA, UUSAPOWA UOUSS APOUROPROTEMA-IVOBCCOUR^ COS. MtOSP 

HUMC7 HUMAN PROTEM COENB,E]C0Nf OF •.11MSP UUSASP MOUSE AOPOCYn SBSNB PRCmASS Qe«. COMPLEn COS. tMMP 

HLMPRCA HUM«NPnonMCOB<E,C0HR.CnC08.n7BSP UUSASPATC MOUSE CYTOSOUC AS»AffTAn AMNOT R AN SfWS E tSOBgYWE MmA. IA46SP 

HUMPflCM HUMM PROTEM C MIMA, COMPLEn COS. 1A43SP MUBASPATM MOUSE WTOCHGNORUL ASPARTAn AtMOTftAMFCUSfi OOeOVME MRNA. 

HUMPRL HUUANPRmOLACTM(PflJURNAKSaP UUBATCUT MOUSE T«ai SURFACE AKnOWLSrA MMA. COUPLEnCOS. MtTSF 

HUMPRL7 HUMWIPR0LiCTMaetfi.EX0Nt,77tSP UUSATCTSt HUMAN T'CfiU SURFACE ANTIODITa DaTACHAMOOC. EXONS SA4 AND 

HUUPRP HUMMPRI0NPROrEN(PRP)MRNA.C0MR.EnCDS.M1«SP UUBSANOS MOUSE BAND SMFttA0COOtlO AN AMONEXUMMNtOiOAALUMAME 

HUMPRPA HUMAN PRW LOCUS SALfVARVPRaS«-RICH PROTEM MRFMIPRlMiaE). MUSSFKW MOUSE •OE'SMDMOFACTORMRNA.OOMPLEn COS. SJOMP 

HWPRPS HUMWPRHt LOCUS SALfVARVPR0LI«4SCHPROnMURNA{PIFAUjELE). UUSC31 MOUSE OOUPLOMMT OOMPCNOfT 03 OM.ir M. SOeS^ 

HUMPRPC HUMMPWf LOCUS SALWARVPRCLME^SCH PROTEM l«NA,a0NE CPS. MUSC38 MOUSE C0 M R . O WT COM P C NPIT 03 MWA. ALPHA AMD BETA BUSUNffS. AMTSP 

HUMPRPO HUMW4PRB1 LOCUS SALIVARY PR0LMS4«CHPROnMMRNI^aONECP4. MUSCAAI MOUSS CARSOMC ANHVORASfi •OTrMB N COMPLEn COOMO AND » MRNA 

HUMPRPE HUMMPRB1 LOCUS SAUVARVPROLMSACHPROTmum^ CLONE CPS. MUSCALP MOUSE CALPACTM I WAVY CHAM (P3q MM COIVLEn COS. laOTBP 

HUMPRPF HUMMMW LOCUS SAUVAffYPROLHE^SCMPROm* MIMA. CCMKEn COS. MUBCAIK UOUSE KAPfA«ASGN URHA. COMPLETE COS. TMSP 

HUUPRPHI HUMINPRK10OIE(HAE»TYPESUSFAMLY:AfciAfiPnH14)B«O0Ma MUSCCPA MOUSE Cit MmA BCOOMOTCELL SPfiCnC PROnMCCPI. IJaMP 

HUMPflPH8 HUMINPRma&tfiOMeN-TVPfiSUSFAA«.Y:AUafiPRKM>9COOSa UUSCKURO MOUSE MUSCLE CREATME KMASE MM (EC t.7^, H1ISP 

HUUPSS HUMAN PSa MM COMPLSn COS, FROM BRSASTSREAST CANCER CEIL LM MUSCklYBR MOUSE UVS PROnXWCOOM MmA FFMOMOfT FOR CAfVS PROTEM. T7MSP 

MJMPSPA HUMAN PUJiONMVSURFACTAMr-ASSOCUTCO PROTEM MM COhn^ COS. MUSOMYCS MOUSS &AIYC OM ESCON* AM>* RAMC MISP 

HUWSPS HUMAN PUMONAmSURFACTANT-ASSOCUTnPnOTOI MM COMPLCn COS, UUSCRR MOUSE t l SHm PROTWASE MM COt^LEn COS. tTWSP 

HUMPSPtA HUMIPUUIONARVSURFJCTMT ASSOCIATED PROTSNPSP-S MMA naMP UUSCRV MOUSE CVSTBC-nCMSfTESTMAL PROTEM |CR#> MM OOI*LEn CDS. 

HUMFTW HUWW PMUTMmOO(PTH OEMS: OOOMORfiOION AND mANtllMRP MUSCRYW MOUSS LMSffTACRVSTALUN MM EXOMS A 4. HOSP 

HUMRAFR HUMMMRNAPORRAFONCOOBAaTTlP UUBCRYOl MOUSE OAAMACRYtTALLN- 1 URNA,CCMPLSn COS. MlSP 

HUMHMH HtMANO'HARASIPWOrEKMCOO W .COMJnCOOSiOSEOUPCS.AAMSP UUSCRVOS MOUSE OMRAA CRYgTAUJK URNi^ COMJn COS. IWSP 

HUMttKS HUMNCaiULARPRaroONC0afiNEC«MUS3,EX0N4AANDIUM(S.AA«P UUSCRVOAJ MOUSE Q MSSA C RVSTALLM-4ttO«. EXCN3.AAASP 

W MRASNOt HUMANW.RASPROroON CC 0 EN E.EX0N4.inSP MJSCSPQMA MOUSE ORANULOCYTEWR C PItA OS COLONY STl<ULATMO FACTOR (OitCSF) 

WMMN4 HUMM«cexULARNMPROrO«NOOaOC.ExaN4.t2aSP UUSCSPQAtt MOUSE ORANULOCVTfWCAOPMOfi COLONY BTMUUTVO FACTOR (OMCSF) 

HUMOSP HUMMIRETMIXSM»aPnOTCM(RSP>MMeOM^CfiS.MAP UUSCVCtUC MOUSE SOMTC CVTOC»«0MS C QB«. O0««LSTI CDS. MMSP 

WMUa HUMMPR»ROR&AXMHIMMCOUPLEnC0S.HMP MUBCVPMS UOUBE CYTOCHAOM Pl4tOURNA, OOMPUnCOOSta SGOUOICfi. MIlSP 

- MU8CVPUX MOUSE CYrOCMOaP-t-4HQmB,COMPUm COS. MM8P 

MUSCma MOUSE CVTOCMMS«P44WaB«.00MPLXn CDS. MMBP 

„ L PROTEM LStmSP MUBCYPMA MOOSE CYTOCHROME PI H O MFSM OOMWjn COOMO SfiOUENCE. lAWSP 

^ ^ LPfOT««S140QC.COMFLEnCOS.iMaP MUSSOF MOUSE EP SJE WAAL ORCTWTH FACTOR (BOF^MRN A. 474«P 

HUWVSt7 HUUANflraOMALPRarTmSt7M«M.CCi«LEnCOS.4T7SP MUSfiOflPB MOUSE ^OCTMALOWOWftH FACTOR SMDMO PROTBM TYPE S MIMA, «>t6P 

HUMHOBR HUMMURFMPORtSACLMiAWnomBSrSETACWLUfflSP MlM E I F lAL MOUSE •mmON FACTOR 94ALQNaR0RMMPNA.CCMPLEn COS. tTTttP 

U SSIBISI WJMMQENEPORtCA CLASS PBSa BETA CHAM (E»ON»H. IIMSP MUSEIF4AS MOUSE SSTUTCN FACTOR EIF4A SHORT FORM URNA, OOMFUn CDS. 

HUUBa4a4 HUMMOCNE FOR »LA CLASS DSS4 SETA CHAM (GtON 44). nUSP MUSERP MOUSE ERYnMPOCTN OOlfi. COMPiCTB. 3St1SP 

D OROWTH FACTOR, UUSPOU UOUBE OSmMMLATE REDUCTASE OENElEXON VI AND > END. 474SP 
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tMUK» yOUUC4Qaoe«;Cei.lJUWHOHOLOaTOmM.ONCOOBS.a»M8P MUBTMVST yOUUniVMDVUTimmuSfiOM.CXONT.ITltr 

uusocsF iiouu(Mmui)cvncaowr-mitiUTwawTOA(0'C8ft iiuvtnf MouBCTXjyoniccnoe«FACTon(Tw^im^coMUTico».i333ap 

MuttOH MoiM#«L»C)OfoimiHomoNC(tGiuT0rfiOM«iMw^c^^ uusnsAty MOUSE N.PH*.iuoaMaoTVPCiMimi.iww^cGMPiKTt cot. wnw 

Musoi MouMM«rTeiivamoTEMOFAoemATicvcLWC.NMAaMW mustuuuu yousGNMA-iuQUJNMOTVFCiMim^hmM.ooMPimcot.tsMa* 

Muaora wouw Qcvcg«L-»woiPWTt o&wowoa p i wt opk. oawjn cos. mustiaaw MOuotjiLmA-TUMiJNnaTVPf iMLPHuicmcoimcTvci^ 

UUUS UOU>ESnilJLATORnrOP(«0rT»OFAD9nUTVGVCLA»C,Am^ HUVTUVAH IKXJSf «^»U.TIJBU.M 90TVFS IMPW^ IffWk COMUn COft. MMV 

MUSH IIOUUCniLOCU»,marTWHMfM^COI#UTICM.4300ar UUtnjtAM tlOUUMM*-nJM^«CrrrKIMmh-7W«IA.CGU»l£TlCOt.U»iaP 

UU8WA iiouMMmu&<»Moo«wn>ifliw».t4*ttP MUBTvnn wout6M»MPOft'nnottffM8i<GCt.i4.ia.i).Maar 

W,WIBMin MOWGM«.VBSinOMCKTA<)L08MOeCKTA4«jOOWn.aBC«flJM(>. UUtUP* MOUSS UMOUNAM UftM BKOOMO Unom«MC-TVVC njliMP«3aS* 

MWIIH l«UnMI«,rBMVOWCKTMLO«MOC«KTMItjDOM*10BS«aJMa. IMWAP UOUM WtCVAOOC PfHrTmilf»M.»MeP 

MUei— iM yOUUKTMLOMUMOAOnLinW UURWAM IIOUBS«WCVACKMCPMnEMaBCCX0M«.1tMP 

MUBIMMtt MOUUKTAAOaMtMOROBCincar MTMM RAT UVn M^MM4MCIKK)L0tUJ« MAM. OCWn^ 

MUB*«0Ya£ MOUBS tfO Tl WJMLOUt 0O« AWD »«7tBI> HATMP fUT ANMOOOI UNDNa PHOTSW MRML tSMSP 

MUMflT4 MOUMHIirT1«ICTlWILMSTaWO0«MOaAM(t.«UM fUTACCVB MT CVnyuaUC OniUCTM OM. OOUaEIl COI^ WMSP 

HUBMffT MOUM WWMOW* W«6W«0W0eVtT m W Bf LH A B H t»«l' ) MrNA. IIWP RATACQAI RATACYL-COAOXKMMMnNA,C0UPLCTICn.9M1tP 

MU8»mn Mouu HVfoo(«m««fHoepHonaoe%TAwenMssoac(ExoN«i.t3^ ratacsu fUTSKaerAiHuecLf ALmA<AcmaeK.ca«UTico».MO(«p 

MUnmO MOUM HfMAPOn not MNJCnCNFMrOAHOSP RATA0PA1 RATM.PMt^OaLY00f*MOT9«AaP)0e«MnNA.7Nap 

MUBtRiAiH MouHHTBraa)MLmAoe«.cijaNEMuni-ALfHA-i.ttaaap MTAO^Aia RAT(SPfuai««AM(jv)ALm^i-ACiooLroom9r»(Am^i<AOf>o 

MUBtFNAM IIOUUinB»8IOIiMiFHAae«.CtjONEMU»(-JILnU-lMlBP RATA0PA1H MTM^HA-t-ACeOtVCOffWrmtOBCCOWUTECM AtMBP 

MUHmA7 H0UHMTBVEMNMJHA-7aO«.C0UPUnCM.n«JM RATJIUIC RAT MJHA4^MSIIHN MWA. COMPISTI CM. WMO^ 

MUttTM MOUMMTBV»m«CTAHf«M.TnW RATILAD RATUV»4f)e.TA-AIMCLSVUJ4ATtOCmRABG(NJU|ft««UkOCt»UT^ 

MUBima myuK%MjmwtrB9vw(im<ytMiiicaHAJommKif$m ratals RATAuuwMrmMtflTHvnoM)MnHA,ccifaffTBCM.iMiP 

MUMKAC M0UiEnKAPPAl»rROIXICTWB.VIKA»UN0E00e«:MOKimv>inEOICN. fUTAUM RATSSftJMAiftUMMMnM. IMMP 

MUMOKAfS IIOUHtaKiWPAACmvCa0S:VTtOONiTANrRKHON.a*nif RATALO* RATM-OCUHAURMkCOMaCTICM. IMSP 

MUMOKBC U0UMiaAae*IAimTNSAfUWN0e>KA^MHAHV11JtOe<(V^U#'A«1 RATAL08M RAT ALOOIAM • OOC OtCH t AND » UKTWNBLATO RSOiaN. MW 

MUBKKVt MOUttnAKMWMn.VREARRANaCOKAPfA«HAMV1»4IOM(V-KAffAp|1 RATAftf RAT Pftt QCNi OCOOM ATWL MATmnCTC FACTOA. OOIMTC Cf». 

HUMUAt M0UMtaUMBOAIAClTViaB«:C0NVTWTNBO»GM4T4M RATANM RAT AMIOTMHOO» OOC EXON & ttW 

Muewmct HouuiauMOttKTTvcaa«:C0MTAMrncoi0K4nap ratawo rat a athmi HATWJwcnc fRoriPMMm^ comubte cck. tw 

MUM.t M0UUNTBUUKM-1MnMkC0MPLCTiCOS.W74aP RATAPOAOt RAT APOUKlfROTai M 0B«. COUKCTI CM. M3«^ 

MUSU MOUBC(M.MUaCU.Ut|MUnNEPITBUlKM-SHmA,CCMFUTE.KMr RATAKMM RAT AKUPOfnOTCM C« OQC, CCMPtCTE CM. tfOMT 

MUWia U0UMNrEfLEUKHtaB«:EXaN4.«2»P RATAKMOa RAT AMUfOmOTEHAWOBtC. COMPLETE CM. nMOP 

MUeufSC MOUBCWTmCUraNtl«cmoftMmA,COMPLETECM.ia00V RATAKMI RAT AKl#OPfWTSM M (AMM-IMNA, COiKETE CM MMP 

MUMUT M0Uliia8UUK»«-tl«M<CCWl.ETCCM.«3SV RATAPOAIV RAT APCLtPOPROrEM A-N MMA, COMUTE CM. t4ZBP 

MUSU M0UBENFEHLEUKtMQB«.0awn^C0Dt<aSEaJ84CE.«MF RAT^^OEA RAT APOUPOPAOTEME aB«.COUKCTfiCM.MMar 

MUaUS M0UMtnCflLEUKN3OB4fi,CauaETECM.3l4a»P RATATCO» RAT MRC OH* AWTOP< MHMA (C » tU I UMt T LYMPHOCYTE HWSiP» WtCSP 

MUSUC HOlWI»nEaBJKM4O0«.OOMUnCM3«MSP RATATPAI RAT HA* JCATPASE ALPHA SOftMl CATALVTC SUBIMT MfMA, COMPUTE 

MUSMTt MOUSE Mrr-tMMMARYPnorTDONOOaOC.COMPUn CM 4«1 ISP HATATPAS (UTNA.J(*'ATPASE ALPWM*) WOFOfM CATALYTIC SUSUMTWNA, itOSOP 

MUStfTiM MOUUtir-lPfMrrOCNCCOOCmw, COMPLETE CM. t2MQP RATATPA3 AATNA..(U ATPA8E AlPHiV)l)«OPO(MCATALVTV •MUNrTMRNA, 

MUIKB«TR MOUISmUPOnTVPEDfTKDItBUTMlltlBP RATATPAST RATST0IUCM|H*J{.>-ATPAaEM(MA.CCMPl£nCM.M1iSP 

MUBKTCa MOUSE BtOOtCVTCKnAIMMIM^COMfUrTS CM t9MSP fUT«P RAT iMUNOOLOSULN HCAW CHMN BttCMO PnOTOt ^ MRMk OOmCTt 

MUlKTm HOUSE KERATBIfCPVeiM) TYPE I WrBMCI)IAnFLMI0fraa«i,E»NS RATCAU RATCALCITONMOM, EXONASPCCnCPOnCALCmMMCMSP 

MUIKT9M MOUSE KPWTW fCTP ET MAU WTCTMgMTE FLAMOirSMWff I.MIWA. HATCASS RATaCTACAteNMRNA. 1lt4SP 

MUBLSP MOUSE mil LiPDBMMiaPflOrrEKCCUn.ETEmw.tWSP RATCASSi RATSfiTACAS£MOEIS.E]tONS?ANO».1M«BP 

MOKIPA M0UUM»OCVTfUPeSM)MPIKrrBN0M.C0mCnCM.ttiaP RATCAK RArKAPPAXAteHWIMA.COMPLCTfiCM.mBP 

MUOLOMUr MOUSE LACTATE Dera]«IOOENWE<LO*tA4«OSWEO0S.E)(OH 7. MaSP RATCATV RAT UVCR CATALA8E MfMA, COMUTl CM tNtSP 

MUMIN. MOUtfi MALIC OinMC*MUTENADPOanO<KOUCTMQMf»4A,OOMPLETE CM, RATCSXPA RAT CAHSOaCYPEPTWASEA r Oe« m»«{CHA) AM) MHMfc fWSP 

MUMP* MOUtfiMVaMSASIC^nOmOe«.OIOOC)MaMAND1UKOPHarrEMt. RATCCK RATPRmoCHOLfiCWTOMNMMnNA.M0aP 

MUSMPIW MOUBEHVOH SMC P«OrBNMRN\ COMPLETE CM. aONCMt'MR.IfMBP RATCOO RAT CH0UCYVTOKMM(0GK}a9«,EXCHt. WISP 

MUIMPA7 M0USSMVBJ4SAtlCPH0rmftBP|0O«.EX0N7Ut-*K0PRarGH. RATCCKR RAT tRAM CMOSCYSTOIOW (OCK) MISIA. MSP 

MUBHCOF MOUBEMMTCELLOnOWnH FACTOR (MCOF) MIMA AND r RAM (PAHTUL). RATCm RAT SKELETAL MUtCLECRKATMEKMASSCOMPOtlTCIMULOOMLSTB CM. 

MUMMOII MOMS MULTIMMNBtaTANrmomMfMA, COMMUTE CM. 42W RATCMM RATfrMMONCOOOfl.CCMPlSTI. trtOSP 

UMMCn MOUBfiMCTA110rrHOMSNaM|tfr<|.IHCttP RATCPtIO RATMfTOCMaM)NM.CAHSAHVLPM0SPHATESWmCTAa«iay«.EX0NIS. 

MUtMCn MOUtSMtTALLOnflONEMI(Wr4|aBS.140aSP RATCAP HAT COmCOTKMM&lASMa FACTOR |CRf> IM«A, COWLETE COS. 1 tMSP 

MMUOSt MOUSE MTA^ASCmOLCSULMMRNi^CCMR^ CM. M7SP RATCRVO RAT LM OAHUACRVSTALLM OBtt. COMPLETE CM MOBP 

MIWMilASi M0UtEMHCCLAMtW4ASETA(HAPLOTYPES)OM.C0i#LETBCM RATCRVOA RAT iOm 0 WH C WYtTAUJM MIMUl (PAffTWU. CLOW PW. Q WW 1 . 

MUSMHASa M0UtEIMCGLMSIKMA«CrAaCNi{HAfUTVPEDhEX0Ntl4.4MSP RATCTRPS RAT CHVU07mFtMSOB«. COMPLETE 000MatfiaUQCS.ImS^ 

MIJtMirOI MOUSE MAMflHSTOOOMPAimjTVCLittSiaMOIftr BO sniKMC. RATCYC RAT (tPRAOUE-OAMLEV) CYTOCMIOMS C 0 

MUnMOO MOUSSMHCCLMSIKt-OTIUMfLWaATiaNANnaB«aBC(MPL0TYFEOL RATCVT»4«C RAT CYTOC»A0ME P<4WC S 

Ml—SiW MOUSE IWOCL4SStl««ALPmO0S:EXOI«t-».3t73SP MTCYMW MAT crroCHflOME P.«HO METHn£MCLAMr>«D»MM«U 0B«. COMPLETE 

MUEIiWl MOUBfiHHCCLAStlHi«BCTAa0C(»«MjOrrVPSO|.E)CONtM..I»tSP RATCVMM RATCYTOCMOMEP-MaiC MIMA. COMPLETE CM MlMP 

IWEISEMP MOUSSISCCLAtSIW4JkMPHAaO«E(DHAaorTVPQ,MRNA.fnSP RATCVNIP RAT CYTOCHROME P^HPCN (PNCN iCUCaLE) MWM, COMPLETE CM. S040SP 

ISJEIStAtW MOUSE ISCeLAtttm4WMLPHAOeSKHAPL0rVPC).PAfnUL MIMA. RATCVPS* RATCYr0Cm0MEP-4MS(PHDI0SARSfTAL'SOUCaLQaS«,EX0Nt.MaP 

UUmtKtO MOUtSWC0LMtlKI«ALPHAOa«(Hm.0rrYPC0|.IM7tP RATCYPCNt RATPRE0NBCLO> « «»ALPHACAHSCWTM.B W WrH CYTOCtWOMEPW. 

liSMilfSI MOUSCISCCLAMIHi-KOEMtMAnXnYPEB|,EJC0NS«AltI7SP RATCVPOM RAT CYTOCHnaWS P.4tOO OBC. COMKSTS CM MHSP 

MMMWP MOUSE HHCCLMS I »««OM(HAaOFTVPCO».WSM COMPLETE COOtO RATCVPO RAT CYroCtWMK P.4Mg (P» O CSAI»r r AL-M»UC«LE> OMt, tXXM: tXSP 

MIMWrnA MOUMIMBeLIIStlH>l(O0S(HAPLOrTYPCD|,COMPLEnCMiHttP RATCVPM1 AATCVTOCMMM P^MSA-nMNA, COMUTSCM ItMSP 

UUSfeSflOM HOUSEIMCOLASSIHl«OM(WPLOTVPC0».MftlA.OOMPl£TECOOtia RATCYPOOtW RATNAOPHCYroCHROMB P^tOONDOROUCTAtE WMA.O0MPUETECM. 

MUtMiOr MOUSE MHO CLMS I »««a0«CHKPl0rYPS to. COMPLETE CM. I471SP RATMP RAT WnMNDBi«)t«aPnorreN<MP)MRN^ COMPUTE CM ItlOSP 

MUEMt0A3 M0USSIMCCUMIHKaB«(HAPiarrYPE0».a0Nit74.EXMt4« RAtfiRA RAT T R AIMLA T IOWM. ■TlATCH FACTOR ALPHA tUSUNfT MWA, 

mttOXII M0UtEMHCQA»RBOI0MO7OOi«P0RCLAMIAWnaENEK0M4.T.UI4SP RATOA* RATaA«TASCIO0«.O«ONa.WlBP 

ItJtMtrSfll MOUHIMHeaMKOIONOiOOCFQROLMtlANnOOICMaHMUTSaP RATOAn RATaACTABEiO0C.EWMA.3MSP 

MUnsnUO MOUtSMHCOUMim-IUOM ITXAIC WPLOrVPO.t3IOSP RATEMO RATPRm0OBt9WLJNaaifi.EXm«AM(UMCt.tmSP 

IStSrw M0USiALIlAUhnMMLiaHrCMABa,E]B0miAI«7O0MI0NTOS0rra RATEBP RATEPIMlWMLIWMABCOOMOAWROOBM)EPBCQffSBeRBT0WYPW0roH 

MUEUt MOUttSMfOUCaUXPfMrTmURMAOONreWMStaSCTTWRStttTAHCS RATMM RAT WTCTTMAL PATTY AaOSM OS<aPW qnM« r ASP t M ff tHMSP 

MUtMVSM MOUSE WIS PHOTO OMCOOWEMRWPCOPMOnKP MVS PWOroLCOWiTTE RATfABA RAT LJWi FATTY ACIOMCWQPWOTOt r ASPlM^AMBP 

^OC0aD«.0CMPlErBCM7MaP RAinSflO RATL-FASPOeCOCOOMOUVei FATTY ACeSMOMSPAOTBiAaiSP 

■ ailO«miMCrOflOO«.E»iMM.IHOSP RATFCH RATFERWrWLIOKrCHABItUSUMr.lMULtXISP 

MlWIWrWM M0US8ALPWU«RVfafl0WTMFWT0ROBCC(a«t4.aiMBP RATOAPOM RAT OLYCEWLCeffCS^PHOtPHATE^W nnOOO Mt E jOitflH mtUk. COMPLCTE 

MUENOn M0USSSCTMeMafl0MrTMFACr0ll(SETMiaP)MSIA.11MSP RATOCR RAT OLUCOCORrnOOO RKOTOR lOTUL COWLETE CM CttSP 

MUBNOPO MOUSE »VtOHOMTM FACTOR OUaSIAaMtMTMWMA. COMPLETE CM. tWSP RATQR2 RATMtULMUtEanO«rrMFACT0Ra{IQF4)M«iA00MPLBTECM. 

RATOHI RATanOMm4HORMOI«GB«,aAMCt«aMOIONt.A4MrRONt.n»7SP 

RATQW RAT PMBOMATCrTWCPM lOWOtffH HORMONE) OBfctiaSP 

I. COMPLETE CM HMSP RATONRA MT MORMCNE-fiaEASMO FACTOR OBC. EXOH t. MMP 

^ fiM9»OCVrEPia9«.E]C0N4SH8P RATQLM RAT SETMUCUnOMOABE WNA. COmETf CM. MVMP 

MUBPoatt MOUBB««iC)CaiUARTUSOHA*ma0IPB>O0«.G(CNtl.WBP AATATPO RAT KaWEY OAtSil Q LUTAtm TWXPPTIO AIE mm, COMPLETE CM lOTMP 

MUtPOM MOUSSPt»CSUJtLAAnjM0RANTMBLMmA.t7TaP KATOLUt RAT OLUCAOON OBC, ENON «. WtSP 

IfflBEl MOUtSPtlCSLiJULAATUiORANnOGNHRN^COMPLSTBCMttTTSP MTOLUTIW RAT SMM0LU0OSETMINBPOATERPflOTD«t«NA. COMPLETE CM CI718P 

BOLYOOPNOTENPO-IIMMA. COMPLETE CM. MTONPAI RAT OU MM B WUCL fcUmfc ti» O M U PW U I M N<UALWIAtMIMrrMRHA. 

aHMS.aAMMIUKMrMnWL RATai#AS RATaUA C aMUCL iAl i m SM U S «> P m^ EMO^Aim> S MtBWMHNA 

i>tP«OT»KBUBS.C0WUT« RATOETlYA IWT UW< OLtflAmaNB S-T R WM f W AS B YA SUSUWT. aO M W pqTWIt AND 

■ CATALVnCBMUBrMRMLmttP HATWTIYA RATLWEWQUffATWBNS t-TR SI MM IB YABUSLMff. f l fl NBE PlI ST M AND 

i. SCTACATALVnC SUMMT I SMP WTOSTF RAT UVOI OLUTATMONB S-TMANEPEflASi P SUHMr MRML COMPISTB CDS. 

MmaSTPA RAT OSr-^ 0048 mOOOMaPLACCNTAL-TWSOUITXIMOMSS-imNaPCmaS. 

PWOLS BR BIMWB^OOIgLJTECCfcTMSP RATQSTVB RAT YS30LUrA n tOW S t TW MB WR AS S MWMA.CCMPtSTlCM « 



„ AXTMTYC MrLJieiflUirxntO»as-TmHBP«MBS1CSUMMrrMM«LGtOMMTECM 

S#A L S C| P»0UI C TWMM4(LOOMHJETSCMS 






MTHai MTDEPMLOHKAyVCHAHOOMn^COOMaR 
RATnn* RAT MBULM4JCEanO«frM FACTOR i(MP^OB«.C 
RATWOn MT PW EP WOBi m LM-tKE OWOWffH FACTOR »tm*K lOtW 
RATMSI RAT MSULtM^I) 0048. M2«P 

RATtm RATMBOMiOBCptS^* 

RaTKHOI rat Htm MtUCtUM WEN 

i. piiorni^aB«MoootoR)Niji*niSP RATRBfrs. rat lcm molbdular twpowr frMiin t kbsicwb* i iwsk complete cos. 

EiLAmStUSLBfrflOaGiaNtamt«4IISP RATKPCQAT RATP«0iatOMN.»«ETOACVL-OOA1M0LASfiMML00mETECM.1SMSP 

E.0aiMJ>TECMtM1SP RATLM RATUCTATB OE>f IPW OO PII IS B MRNA . COWIETI CM MMSP 

JBPlOOMPLITECOt^imHASttOOMP RATLECM RAT «ttALOaLWOPROTCMRBCGPT0A<HMnCLfiCT*»#I]|HRNA, COMPLETE 

M4R>ntSniMAtmoeA(tA^tPROrTOi*l«SP RAM nATLVTnOPMtHBErAtUSUNirae«.C0Mt«TECMI7tMP 

MUS8MMP MOUMBmwCEmiUWm* MAWHARYOLWOLVOOPWOmBNlPUrATWEl. RATH RAT LMOUALUPASSMMA. COMPLETE CM, IJHSP 

MUSBUOPP MOUSSBMISU n iJWYOLANDiBSAA C OOBWPORAPWATIVEPOLYPCFtWE. HATIMSPAt RATMWil O B r BI « MO PWO>TP«(tCTUH AOB4S.E)W»A»P4. ttMSP 

MUBiPrrt MOUSE TCai.lP6CV1CtCRB«PRCnAtSMIML00MPtCn CM tOMBP RATMRBPC RATMW4i C SB Bi WHIl PW glEMC tl WS MWSLCOWLETECM WPSP 

MUSTOSm MOUMT-CaiRSCSTQRACTIWSSTMHAMMRNAnMUSffrAl.t-QHNSP HATMPH RATB C IWWBICEU P LN PI HR <LMm jH(P.HMWIA.OOMHJTECM. lOMSP 

MUBTCOM MOUSE TCttl. WBCEPTON ACTTW flUMMA ri WEI MHIIA mOM CTLK Cftl RATMBM RATLnCRMCflOSQMAL KBIOSIOTC EPOnMHVDflOUSE 0nE,EKMI. 

MusrooMc Mouts T-caL mM^ t xM ACTWE n mtm n m n m mam CTL-oaa cca imtmeti iMTi«TAiiorM0HB»4#ffr-t)MRMLaHBp 

MUSTOOXB MOUSE T^fli. BBCEPTCW ACTWS O MSMA Tl WM VI»JS^I» J MIWA. FROM RATIUCm RATFASTMir0SMALJtAUUOHrCHAB»Elt0NE. OOiSi0NTOSOIMISCI-» 

MUBTHSH MOUSiTMVMeYtATESWT»USEHnMA,COHPLETECM.MlBP RATtCO RAT WART MVOtM UOHT CHAM 1 (1B.C« MRNA. f BM). CI OSP 

UUBTWtO MOUSSaBCPORT>ff-lAMTUB«.MMSP AATWT12C RAT WETALIOTMONEM^ AND METAL10>TM0NEi«-l OBCt. COMPLETE CM. 
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RATWYUG nATH.UGCNCF0AUUSCLEMVO6MLtCmCHAM7,3MiBP BCM.W DOV«« Ll/Tt»SMO HOWCNC BTTA SISIMTT OM. COMPLETE CW. lM4aP 

RATWUR nATMRNAF0ni«UQNC0O8«(PlH)M6BP BOVIWX B0mELimO>M(LH)BCrA MAimMfmCOWn^CM. t29^ 

RATMMOn MTNAI)tP)HMg4AOIONGOXXK>RCOUCTAUUfMA.COMPl^ A0MI8 BOVMC WjaURMN MWfTVia BIBSTANCC MfVU, COitflETC CDS. »1tBP 

RATNNE UTNGHMEURC>M.B«L*8E0«C)UfMf.COMI\ETECOS.17t7SP BOVOPM BCWtti OPSMOeS CXONf WD9 RJM(. ttlBP 

(UTCM HATCNCCMOIMJLMUm*.CaiPLFrCCOS.iMftP KVOT BOVtC PfCPn&CXrTOCM#CVjnOf>HVSM tOOS. tt«W> 

RAT06P (UTOflTE0P0NnNMrMA,C0Mn.fTCC0tt.l4ft7BP 90VPKC KVVC On^ATVPC PROTCMKMUi C IMM. CCUR.nE CCe. tWOP 

fUTOTC mT OflNITWWg TTWttCWOIOTlAfli MflUfc ItltttP ftOVPKIC BOWtS PAOTIM KtMSS C liRNA.COUn.CTC COS ZUttP 

RATCrrCA Mt(m^U^OtHmmmc*PBmCtnjW*1Sf^M£mHKCC^ BOVmiCa B0VttCCMIP«CP9COfTPMrTOJKNA8C.BCTA<ATM.rT1CSUWmt 

(UTOTCC nAT0f«(m*SCAnBAMOVLTRAI«FEHAaCURNA.C0Un.ETEC06. tt06BP B0VP0MC7 SOWIC PnOOPOtaJMOOOmM (PCMC) QBC. CXQN3. CCIOM nCCUQN 

RATOCTNP fUTOXYrOCMMCUnOPHVSMPRKUnSOaCOUPLCTiGMMRJMCS. BOVPFH BOVM PProOCatCOOMOBCTA^PREPAOTACMVKMM. CX0N7. MlOP 

RATPMB RAT PARVMfilMN URNA. COMR.CTC COS. M1BP BCWPfl. 80VMS PROLACTM (PftJ HFMA. W7BP 

RATPCC8 AATI«T0CHaN0fl«LPTOPKWVL-COACAf«OXYUM(PCCASC>BnA«UOim DOVPftOPI BOVM PI mOTAMMi MRHA. COMR.CTS COtt. 31«SP 

RATPOt RATPnGrTYM0t8aPMDSrS0MCnA8S(P04URNA.CCMP|.CTCCDS.»«MSP MVPTH KVffC PRCPROPAAATHVnOO HQRMONC MMA. 47WP 

AATPCCO* ftATPHC»PHOCNOUnmUVATCCAflBC»tVKNASC(OTP|OaC.E«>StWOlO. BOVPTHZ BOVME PAfUTHVROO HOHtKMi r M) OF HmA. MOBP 

RATP6C0A MTKWmOtm.W0ffiWcWOROJMAt^mROX 90mm BOVMPAAATHVimDHO(«IC»CaENC.COMPLCTCCOOWQflCOI0NMOFUM(S 

RATPOMA RAT PTTUnARV OLVCOPnOTCM HCVMONS M.PHA ftUSUMT. IIOOP MM 90VMC VTTMIMK-OCPOOefT PftOTlM S {BLOOD Ct.OTTMQ| Ut*4A, XM3BP 

RATPHH RATPHamjlLM*aHVDfK}XYlAtCURNA,COiaPLCTECOS,lMlftP MVrO BCVMC THYnOQLOttULM MRNA, COUPLCTC COS. M3IBP 

RATWa RATPACrrGMKNASiCTVPCIMRNA,aONCPKC-«.3H»P SCVTWM OOM PmrTHROhSMMmA WffTH COUACnCODNO REOMN. tO0«BP 

RATPKCI RAT PfWFTCM KNASC C TVPC n MRNA. CCMPI.CTC COS, aONC PKC4) OOVniA OCMNC TRMBOUCW ALPHA-GUHMfT IM*4A. COMPLCTC COB, UtSP 

RATPKL RATL-TVPCPVnUVATCMHASfiMAHA.CCMPLCTCCM.»»BP SCVTmAt MVMC TTWMBOUCM ALPHA^I MRNA. COMFLCTC CDS. MMSP 

RATPLP RAT HVB.P4 PftOTEOLrK) APOPnOTEM (H^ umA. CONPLCTC COS. l3a4eP BOVTRNAS BGVtS TRANDOUCM M.PHA t MfWiK CtMr^eVt COS. ttHBP 

RATFIPX RAT ORAM WVQJN PROTECL»*U> PROTEM (PU>> MRNA. COUf\rTS COS BOVTTMAM BCVNS TTUNSOUCM ALPHA SUWMT MNA, COUACrc COB. 1 1WP 

RATPOLB RAT ONAPavWCfWSC BETA MmA,CO«PLSTC COS. nattP iOVTTMB BCMNS THANBOUCM aCTA SUaUNTT VRNA, COMPLCTC COS. m29P 

RATP0MC4 RATPfKI0PI0MaANOCOmM(POUC)QU«.CXON3.7tlBP BOVrmO BOWVC TTlAf«OUCMaAMMA SUBtMT MRNA, CCMR.CTE COS. 4tTaP 

RATPPTAO RAT OAlMBAA PRgftOTACWYKHM AMRNA. CCMWJTC COS. WW BOVTRNOM BCMNETfWt$0UCM(QrTPASC)OAMM>UBUNrr MRNA. COMR.CTi COS, «MBP 

RATPRLHM nAT(HOOOCD)PnOLACTMaCNC:CXONVAM)RAMtS UaaP BOVTM BOMMS THYM)TnOPM-«tTA fTBH«),COMn.CTi CGOtM SSOUDCfi. MRNA 

RATPRLSOt RATt8PfWU««MW^PnOLACTMae«£(CNSIV.ViM>RAM(S,iaMaP BCWP BOVMC PREPAO^WMNC VASOPflCSSH^CiMOPMVSW n OM. MTftP 

RATPfUOH RAT(SPRMlUC«AMLCV)PnOLACTWURNA.amP BOWPNPI BOVMC ARONMS VABOPtSSSW t NCUnOPHVSW I PRECURSOR. MRNA. tllBP 

RATPRP33 RAT PAROTBOUMOMOlCPmLM nCHPftOTCN MRNA. COMPLCn COS OOOCM OOOCARDIACCRCATK KNASGMSUBWTMmA. COMPUTE COS. 1U7BP 

RATPSaCi RAT PROSTATE Bt«OtiaPnom4POLVPCPTVCCl. MRNA 4 1 MP 0000(8 DOOBRANCflEATWCKBIASSBSUBUNn'MRNA.COUPLCTCCOS. UliBP 

RATPSaCtt RAT PROSTATIC STCROMWKmaPnorCMCI.SXONB 1. 3 AND TRJMt, DOOCIW OCO (CiWME)CHVWOmvPSM MWA. MSP 

RATP8aC22 RAT PROSTATIC BTCAOeWDmPAOTEMCtEXONS 2.} AND TFUm. DOOMS 000 MUM OBS. OOSP 

RATPSaCaP FAT PROSTATIC STCnoe«MDf«l PROTIEM, C» PCPnoC MfNA. SOteP 0OT>eAt 0OATADaTALPHA-KlLCBNaBC.C0MPt.CTCS&OUeC€. 1M«BP 

RATP68PA3 RAT PROSTATIC OTC ROC BM DMQ PWOTCH OBt£ C3(1). EXON 3. 374aP GaT>eAI OOATAOULT ALPHA MXOBMOBC. COMPITTC 8EOUBCE. IMIBP 

RATPSaPBa RATPnonTATICSTlRO«^«ND»IQPROTtM.QaCC3(7).EXCN3 37IBP QGHlWEt OMT EMBRVOMC BCTA^ILOBM EPSI.CN4, COMPLCTS OBC MO R>MK8 

RATPTHl RAT PARATHYDOtO HOniONE OCNC, EXONS I AM) n, •7aBP OCmSBCI OOAT OMBRnrONC BCTAhOLOBM CPSI.ON-II COUP! ETC Oa« AND fUW«($. 

RATFTRl RATP7ntHRNA0COONOAUKDPnarEM|TRM«CnmCNNOUCCOBY HRSI«A1 HORSE (EOUNC) M.PHA-t OLOBM OeC (Bll HAnOTVPC). t33lBP 

RATPTRVB RAT PANCREATIC TRVPStlOOB4 II. CXONB 4 AND ». MSP HRSlFNt HORSE HTCRFEnON^MCOA^ OM. COUaCTE COS, M»P 

RATPTRVI RATPANCPCATtCTnYPSMIO0a.COWn£TECOS.«6OaeP MBimtA HCnSE PfTCRFERONALPHA-l OENC. COfeVlfTC COS. 3>ueP 

RATRDP3 RAT RCTMOC-BMOBM PROTCM (RBP) OENC, EXCN ». 4isap HRSlFNiO HORSE NTfRFERCNOUCaA-l aB«. COWn.CTC COS. atOSBP 

RATwat RAT CaLULARRCTMOLBMONaPnOTEMIfCRBPf) MIMA. COUn.CTE WSimi HORSE NTERFHtON-ALPHM OBC. COMPICTE COS. WOP 

RATRELAX RAT PRSPRORBJUON MRNA. lOOeP HRSIFKIA HOnSC PrTERFEAONALPHA^ OM, COMPlfTC COS. lOMBP 

RATRH.I RATASIALOaLYCOPnarEMRSCEFrOR(HEPATCLCCT»4RK-lOB4i.40MeP HRS)»MA HORSE »frCRmONJLPHA4 OOC. COMPl£TS COS. M1BP 

RATRPLIt RATna060MW.PWarEML1tMRNA.C0MPI.CTE COS. TOIBP HRSIM HORSE PfTCRFERON^BCTA OK, COMPICTC COS. M7BP 

RATm^ RATnB0a0MALPnOrEMLMMRNA,C0MPLCTCC0S.MaaP PIOCCK PiaCHaECVST0KPW(CCK)PRCCURSOAMnNA.(CCUPLCTC)ANOrS«> 

RATRnjSA RAT MIMA FOR flOOOOMALPROrHH USA. 3<iaP PUmO PIO PREPROeMCMMJNS MRNA.COMPLCTC CO0B«» SCOUBCC, m»P 

RATRPSn RAT fl WO O OMAL PftOTTEM Sit MRNA. C0MW,CTC COS. «34BP PttOASTI) Pn OASmN MflNA. 4UBP 

RATWS17 RAT RBOSOUALPROTEM BIT MRNA. COMPtCTi COS. 4MBP P10NHA PIO »B«BM A SUWMrTMmA. COMPlCTC CDS. I27SP 

RATRP8M RAT nMSOMALPROTENSM MRNA. COMPLETE COS. 43SP PIGNHAR PORCMC MRNA FCR MHBM N-PHA-SUBUMT. laoaSP 

RATBtOO RAT BRAM S-tOO PROrrCM BETA SUBUMT MRNA. COMPLETI CDS. I4MSP PIOMWAR PORCMi MRNA MMON BCTA (A^-SUWMrT. 3»738P 

RATSBP RAT PWOSTATIC SPC n MM C BMDMO PWOTTCM CSBPl MRNA. COMR.ETC C DS. POPOMC PO PnO0PICMCUINOCOflmN(POMCt MAM. COMM. ETC. lOtS&P 

RATSCD RATLfVERST1ARn.-COADGaATUIU8iMRNA,C0MPLEnC0S.4«MBP P10RB.X PIO RGLAXMMRNA, COMn^CTlCOONa WO 3* WTfWNBLATED REOKMS. 

RATBOMUl RAT80MATOSTATW-14GCNCCCUPLCTEC0S.9M7BP PIQUPA PIO UPA (UnOKMASS-TYPC n>8MM00O4 ACTIVATOR) OBtt. COMaETE COS. 

RATSOMAT RAT PmjrrARnrPRCSOMATOTnOPM(aAOWrTH HORMONE) MfMATWBP RABALOA RASBfT MUSCLE ALDOLASE AMRNA. COMACTC COOMO SEOUecC, 137tBP 

RAT80MO AATPflEPR0S0UATOSTATMC0Mn.CrCOS<AK)raAM(. IM7BP RASATPAC RABWTCA- . Ma4>EPe«)Bfr CA4TPA8S MRNA, Ca»n.ETC COS. 4343BP 

RATBPOC RAT CHONOnOrTMSULMTCPnorCOOLVCM cone PftOTfM MRNA. COMPLETE RAOCDPSSU RAaBrTCALCnM^OCPMCMT PROTCASC, SUAU SUB UNH^ MRNA. COIM. ETC 

RATSVF RAT SaiNM.VC8CLEFPnOTCM MRNA. COMPLCTS CDS. «aOBP RMCm RMBrTMUBCUCRSATMC PH06PH0KNASE 01 tBOmtVMS) MRNA, COMPLETE 

RATSVraa RATSSWMALVCSCUPOeCCXONI.HOBP AABCRP RMBTTCAPOOtS BCOOMO C-REACTIVC PnOT¥»t, CCMaCTE COS. t4UBP 

RATBV8 RAT SaiNALVESCUSPfWTCNURNA, COMPETE COS, MCBP M«CY4S0t RM0rrLFVERCVTOCHnOMCP-4«O lIMM, COMPUTE COS liriBP 
RATTATR RAT L-TVnOSMC* t-OXOGLUTAAATE AMMOTRANSFERASE MRNA. COMR.CTE COS. nAB»«A RABBTT ALPHAOLOQM MRPU. H2BP 

RATTOFA RAT TTUMFOmmOOnOWrHFACTO«l-M.PHA MRNA. MOBP RWWAPT RWBITM^HA^ILCBM OS«. COMaCTE COS AM) A PSCUOaTHCTA-l OLOBM 

AATTHVIO RATTHV-tOaC»COOtt«aCBI-8UnFACEAWnO»THV V2MaBP RW>«Bl RMBTT BCTAI'OLOBM OM (NXaC t). COMPLCTC CDS MDLI 7M7BP 

RATTMVIOA RATOEWRMTHV I AMnOSimiM RMMBIAl RMWTBCTA1OL0QNWTTH2M.TVPE 1 AiaE. lOlOP 

RATTHVS4 RAT SPLCBI THVUCSM BCTA4 MIMA. »7BP R/«»«eiA2 RMSfT BCTAKlLCeM Wm42 IVS. TVPfi tMXafi. t3MBP 

RATTTfT RAT FAST BKaETAL TNT OeCSCOOMOTnOPOMNTBOnSfBn.COMaETE RABtFRCP RAflBfT POL V O RECEPTOR. COMPLETI COOMOSCOUeCE. 3II7BP 

RATTfWnM RAT(SPnAaUE0AWUV)T1W«T>4YnCTWtPR£ALBUMN)MRNA.C0Mn.CTSCDS, ftWOHAS RMBfT Kl UU CHAM SCCRCTiD FORM (MIOTVPC VHA7) COMPL OBC. MRNA 

RATTnOlO RATSKaCTALMUSafiBCTA-TnaPOHVaSMANOFBACBLASTTROPOMVOSMI RMLIP HMWr BfTERLCUKN-l (L t)MflNA, COMPLCTCCOS. IM4BP 

RATTSM RAT THYnamOPfMCTArTSHCOUPLCTCCOOMaSEOUBCC, MRNA, HTBP RMMM AABBrrMHCCLASSIflLA(1t/n)0EN6.kmA. 1««aP 

RATTStOM RAT (SPWAOUE-OAWLCY) TWYWOnWPP MI CTA-SUBtMT fTSHSCTA) M<MA. RMMKIItf RMBTTMHC KjA RCOKMCUSSt l»~) OBC.COMaCTECOS.aMiaP 

AATTWU RATALPHA.TUeaN0OCEX0N8M«4AND3rRjlMt.20MBP RAOUm RAeBTTMHCCLASS) AAni/ll)OB«.MflNA 13I08P 

HATTUnil RAT MRNA FOR BrrA.TUMLMTBCTAtt.l»lBP RABPRMZ* RMBfl' MUBCLi PHOSPHOFRUCTOKMASS OM. EXCN Z2. Z22BP 

RATUCP RAT UmiCHONORIAL BROWN FAT UNCOUR.no PnOTCM MRNA. COMPLCTC CDS. RWPQft RABBrrpAOO£STCA0NCRCCEPTORUnNA,C0WaETCC0S.miBP 

RATUDPOTR RATUOPaLUCUn0N0SVLTTUNSFCRASCCOMaCTCCOS.M>4BP AAflfTCAM RABBrTT-Cai RECCPTOn ACTTVE ALPHACHAM (VJC) UMA. FROM Cai 

RATUDPOTS RAT LfVCR MRNA FOR ANOROSTERONE UOP-aLUCUnONVtTRANBFSRASE (UDPOT). RACmV RASBTT TIMOR WCflOSS FACTOR (TNF) ae«. COMaCTC COS. laooOP 

RATVPNPA RATVASOPRCSSMNSUnOPMVSMOMADlMCTESMSMOUSUiaMrOM. RWrNFM RMBfT TUMOR NECAOSS FACTOR fTNF> MRNA. COMPLETE COS. 1t7t BP 

RATWAP RAT WMCVAOOICFnCrTnN MRNA. *4«P AMUO RABBm/TEnOOLOftM CM, COHFICTE COB. MMBP 

RATWAP* AATWKYACIOiCPn0rTEMOM.CX0N4.3MOP RABUOl AABBfT UTEAOOLOM 00X6. EX0N8 t AND 3. tOCITBP 

RATWAPU RAT WCYPHOePHOPnOTEM MRNA CLONE. ft4MP RABUQM RAasm/rCROOLOeN MfMA, COUPLCTC BC0UBC6. 4B6BP 

SCAMOl SEAL WVOOIOBM QOtt, EXON 3. PARRAL NTAON I ANDY RJMW DNA. 

OTHER MAMMALS 8HPATPAA S»CEP^ AND K*> ATPASCCATALVTC SUBU«T ALPHA MRNA, COiffLCTi 

BOVACMU BOVBCACCTVLCMaB«RECCnon«.PHA-«UBUMr MRNA. COMR.CTS COS. 8HPCASS3fl S»CEPI«NA F0nAL^HA«J4MEN. lOBABP 

BOVACHRB BOVMSACCnLCHOLMCRSCSPTOnaCTA^UBUMrMmA. COMPLETE COS. 8HPCRF 6«EPCO«imjlRUPM.RB.EASMO FACTOR(CRF) PRECURSOftMHNA. ICSSP 

BCVACW CALF NRNA FOR OG.TASUaUNrr OP MUSCLE AC CTVLCHOLMCRECEPTDA SHMtCRBaA 8*«P BM AND ttO OBCS FOR Bl HOH^ULPHUR KERATWS. ATUSP 
BOVACHRE CALF MUSCLE MRNA FOR ACEFTLCHatS RECEPTOR fiPSLON^UBUNrr. lAllSP $»M(EflB3C S»CEPa2C084E DCOONO B3 HIOH-SULPHUR KERATM. KHftP 

BOWCALP •OVBSCiM.FACTW I HEAVY CHAM (PaDPnOTEM. MRNA. COMPLCTC COS. S»«ICn SHEEP MET ALLOTHKMEM GBC COMFIETE COOMO SEOUB4CE. 34afiP 
BOVCHTMOt B0m«CPCOe«DCOOMaCMniOSM,CX0Nt.33SBP 

BOWCHVUM BOVNE CHVUOSM A c nO SMft MMA. II7VBP "'***•* OnHER vcfrranATCs 

BOWHVUOB BOWE CKWiOSMB ( RO MPS M W 4A.<30t>P CMtACACA CHKK» CARDIAC ALPHAnACTM 0 DC. CLONE LAMOA-ALPHAhM. COMPLETE 

eOWa» B0VfCra'-CVaXNUCLCOTnfi3rPH0SPH00CaTEAASS(CNP)MRNA. CMUCAC8 CMCK0<CAnOUCALPHA.ACTMOB«.aONELMMtA-AC7,COI9tETECO$. 

BOVCRVBB BCVtCBCTACRY«TALLMSUBUNrrBETA«1MRNA.aiaBP CMtACASM CHOce«SKaETALMU»aE ALPHA^CTMaM.COMPtrrE COS. BtMOP 

aOWCRWS BOWWEBETMCRnrSTiUJNMRHA. COMPLETE COS. UCBP CMUC8 C»«ttDiCVT0PLA8MCBETA.«TM0E»«.M««P 

BOMCRtra BOW HS OMMUk C ir V B TALLM MRNA. COMaCTC COB. tliSP CMUCCT* CMOCBI TYPE ftCYTOaABMC iCTW OO*. COMn.EfC COS. MMBP 

eOWCSASiB e0mCALPHA4l-CAS£MBMRNA.C0MPLCTCCOS.II7aP OMACHRl CHCK»MCOTMC ACETVLCHOLME RECEPTOR OaTA SUWMT OBA 33aceP 

BOWCSKA B0WfCKAPPAHCASCNAMnNA,0Ol#LCTCCDS.M«BP CMUCHRZ CHCK» MCOTMC ACETYLCHOLPS RECEPTOR OAMU SUBUMT OB«, 40A6BP 
B0KCYPU1 •^<MCn0»0UALCVT0CHnaMCP-4BIKO1)inNA, COMPLETI CDS. UtTBP CWMOI CHCKS4 UUBCLfi AD0(VUTE KMA8S MRNA. CCIVUTS COS. MSP 

BCMCYni BOWWS CVroCHnOMSP-4iOtBOC) MRNA, COMPLETE COS. tM4BP C»«APOI C»ICKEN«BtVL0W0e«ffYAP0LIPOPnOTEMitAPOVL0LI)aENS.4MtBP 

BOVaA BOV»(CaASTMAI«W.COMn.CTSC0S.aM2ftP CMUSPATU CHCKW MnOCHONORML tSOeOYUE OF ASPARTATE MMNOTRMEFEHASE MRNA. 

BGWaB eoVtlCaA8TMBI«MA,COIM.CTECOS.>KI0eP CIWCNkM CMCKBtCMiaaCBCCOMOCALMOOULM.OONri 



BOWBtCPH B0VMEAORBWLPnEPROB«EPHNJNMnNA.t22SP CWCKBR CHKK04MfHAPOR CREATMC KNASS B (frOC. BC t.7J«. I37MP 

BOVRIRB BOVPtfBASCFaROaLASTOROWrTH FACTOR (RkF)lfftM.ltzaP C»«CKM CHOCS* CRCATMC KMASC-M (CK 44 IMU. COmXTE COS, IsaOBP 

DCWF8H BOVMPOUJCLSSTSIULATWOHOfBiONE BETA CHAM MRNA. COMPLETE COS CWCKMX CMCKW CREATMC WNASE (MUSaC 90FORM) MRNA. COMKETG COS. 131 OOP 

BCVFSm BOVtMPOLlXLESTMULATMO HORMONE BCTASUBUWr(FBH«ETA) MRNA, CWCM11 CMCKOICCMI OMCOOMO FOA ACALA400ULMUCC PWrCM. 4SCAP 

BOWa BONWE PANCREAS PnEPAOOLUCAOON MRNA. I ItMBP CWCOMMR CMCKOI OVOrnUMSFERRM fOONMJUMM) MRN^ COMRXTI CDS. a)7MP 

BOMQH BOW« OnOWTH HORMOPC (PAEBOMATOrAOPBS OeS MO FLAMCS. aMBP CWtCPSI CMCKENPAOCCBSCOPSCUOOaENECPSt RELATB)10THE RABONCOOCNEMABP 

BOWIVAI BOWWPnUTAAYOLYCOPWCrrBNHORMOIgALPW^JBUMTOPg.CONATO CWCRYDII CHCKB4 Oa.TA-t CRYSTALjLM MMA. EX0N83TO B«). M7iaP 

B0MLYAA3 BOWC PimrTAAV aLVCOPAOrrai HOfBWNC ALPHMUBUMT flOC. CXCNB 3 CMtCRYOM CMCKBIOELTACRVSTALLBi COMUTEfeSMA 1t7MP 

BOMBAA B0VB(SMRIWF0«tM»emATECYCLASE-«T1MJLATICNaPnarCM ALPHA CHKCRYOS CMOtmOIMTILGOHORPB OATA^I AND tCRTVWTAUJNOCNCS. COMPLETI 

BCMfll BOTME ADULT •ETAOLOaMOOIC.nTaP CWCYC10 CMCKDiCYTOCinOMS C O0«. AtiaS CCIO. COHP^CTC COS. lUOBP 

BOWaO BOWK PETAL 0 WW OLOBil OPC I04IBP CWCYCt CHBX»CVroCHA0MCOB«. AUaSCCIiCOMaCTI COS. U1IBP 

BOW?«AA 0OVMEMTCRFCKMALPHA(BaN.PMAA»OeS. COMPLETE COS. M4BP CWCYMM CMOKENCVTOCHAOME P4ao (PH0«O«ARBirAL-MOUCBLE) MRNA, COMaCTE 

BOWNAB BOVMBBfTEnFBWNALFm(9O-ALPHA«)OeC.C0MaCTIC0S,M«BP CWERBBF CMCKOI CCRBBONCOOM MRNA ACTIVATEOBY ALVMSEimOK FAOOUCNO 

BOWmC BOVME»ITERFERaN«.PHA(B&ALPHAC)aB«,C0MaCTEC0S,10t7BP CHKOAPDHA CttCKBi OAPOH fOLYCCRALOCHYDCXPHOSPHATE DEHYDnoOBtABO MRNA. 

BOWNAD BOVMBNninnACNALPm(BO-ALP»«U»ae«.C0MaCTIC0S.MlBP CMCOBLB CMCKetMKBCTMMLACTOSSf^MONOLSCTM MRNA. COMaCTI COS, 

BOVU DOVMEBnOi^UKMt(t^t)MmA, COMPLETE COS. MP CWHIO CMCKB* HWTONE HI 0O«. aONE LWWOA-CHPI. OOlBP 

BO^KBMC BOVMB EPPCnMN. CYTOItEAATM V 06NE. Cfli*L6TI COS. tTaiBP CWH»40 CHWKOI HWTCNELH*. RH». H3, LWA AND RMtA OPO. MMBP 

DOMtMtHW BOV»CMaHMOLECULARWeQHrcHMW>KNNOOB<.TYPCIMRP4A.3aMBP CMtHaA CHEKEN M8T0NE HS ae«E: COWLETI BEOUOCS A FUNKS. »4aP 

BOWMILM BOMNB LOW UOLCCULAAWBOIfrpfletMiiOaCN MRNA. TYPE t.ltMOP CMW CHCKBI MSTONE HtODC IMIBP 

BOMINMW DOVME HOHMOLECULAR WOOHT CWMV) KMMOOCK TYPE H MMA 3ir4BP CWHBM CtKMBi ERYTWIOCYTE^SPCCnC MSTCNE H» 0»E. COMaCTE COS. SMttP 

BOMCPI&M B0MMLOWMaCCULAAWE)a»ffPRB(MMOafiNMfMA.TVPC3.1MaBP CMU«AA CHCKB< l O IOQLCBM ALPHAAMFMA, MMP 

BOMLMkBP BOVMELMFKAHCUBAANE MAJOR HTRBOICPAOTCN MRNA. COMaCTE. CMtWADAt CMCKSt ALPHAOLOSN OBCS; 0 OBS AM) FLAMIS. lAMBP 
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XaACWM )»0PUtLWV«W«MraiALPtMCAIVMCACT14(NMA1».m«P 

xaMLsn )(a«Puiusw»MmAR)nLAfivALKrAiaLoaM.MattP 

XOCAMA )UAEMiCAUI0OUUIOQCHRNA,Cl€MltOt.»iaP 



XOCfU JtMOPWLAMOMnULEMPnCtMaOAIMWkCLONBPltean.tttIP 



XaCnUt Xii«MiP«0W)CABHUNTVPCtUffM.OaMfUrSCM.»1EP 

nMAM )UJ>fiVliAW*»O U »WMWMA.COIgLETEC0t.WMP 

]iai««nx )UAM(rMm«AM«MLOtM(*LPHA-ia|M»M.rUIP 

XaMATU XlAMrrADP0imiPMMb0EM(«LP»«T4)MfM^ne«P 

XSLMATU ]UJCm(TWPOI««LPHMbOaH(aLPHA-TI}MmA.MaP 

Xajm XMQPUEUWMtErA4lOIMURMA.MaiP 

XSLMtC I»IOPUEU«V«liAJCRUrML0MIOMEO0WLEre.tMMP 

xaMM X.UVWLMIWH.MrA-l'OLOaMOSM.ttTMP 

XSLMM XLAiWM«1O«0mB»M,0aHnJllCIN.IinP 

X&Mma lUABWiHWEHOgtPWWWOEMi ll«PM l.C0MPtfnCOMI 

xsuno xiAtv«O4i»i«MkccMaETBC0t.ruap 

XaFVU XUmEPnAPMKUHtaAIMM,O0MaEmCOt.411IP 




cfscnPA HomaHO«cfv«<i.pa.yPHaiue)cnpoe«t4.B«coDMO&«CACTivc 

DROMTna OUaANOaA8TEnMTMaB«EATCYTCtOGICALLOCU»mtirSP 
OROtCme DAAMOWftTCnACTNOMEATCVraOOICM. LOCUS MP. tIMttP 
OnOMW DfKBOPHLA ALCOHOL DemmOQMASSO0«MDRJWM.tin8P 
OnCMDHC DJCLN«»ASTERN.COHaOEHYDnOOeMSEOmB<MLaEAOH4}. CONSENSUS 
DnOMHO OAO«OPHLAMAUnmM4AN.COHaoemmOOMAttOCNEM>rfUM(9. 

onoADHOX ono60Piuuiuuy«N.CONOLOem>noQOU8fiOMfi.tintP 

ORCANTCP DJAANOaAmRRJSHTARAZUL0CU»{FTZ)M1>CANTBMAPE0IA0CMPLEX 
OROWfTPOt DACUINOa*IT»M«TCMMPEOUHOMECmO(ANTP)a«E,E)tON*.SinBI> 
OnOClAt OJMKiWOOiArCTMONMITONECMOIOeCMALWOTPNCIAiANnQMMBNA. 
OnOOAirr OJCUMOOASTBIOAIITOeilMOOOMOTWOPOlVPPTVEtWTHaMtAZaeP 
OnOWPfn DJitaANOaA«mLOCUii7l:MSAT«NOCKPAOTCMHBPltOME.ttH0P 
OnO*«Pfn ONaANOOAITBtLOCUBCII: MEAT SHOCK PnOTOIHBPMOMBIOKAP 
OMHBNn DMGLANOaAaTtflLOCUBtltt HEAT SHOOK PMTm»ePS3 OEM. U2I0P 
0«O«P«r4 DitaAN0aAa(TEnLOCUI*?t:HEATSH0CKPnOTEMH»P370»«.iatlP 

onoHBPTAt oAiBJViooAffTEn WAT SHOCK pncTEM TO oecs. LOCUS iTAr. scat. 

OnomPTOI 0AiaAN00AlTEnHCATSH0CKL0CU5l7C1;0lffrALKSP10QOCS.i0MaP 
OftCISStT OnOSOPMUMOANOOASTBITTUNBPOSAflUELMBfTm wWM 
DfKltCT O.WBJMOaAffTBIICTAUOT>IKlCMMfM^COMKETSCOe.S3«P 
DROMETO OAOMPHUMaANOaAtTnilTNOMEENCOOMailVrMLOrrMONEKtSOeP 
OnOMVL DJUBJW0aASTnMV06MU0HrCHAM,MRNA.«MP 
OROUVLALK O.IOUINOaASTnMUC-AUtOMEENCOOMailVOBMAUUULiaMrCHAM. 
DRONOTCH OiiaJNOOAirTBI N9TCH LOCUS MRNA, COMPLETE COS. t»t4NP 
ORCNOrcW DA»JH«)aA8TmNOTCHL0CUBOME(EB8MnN.F0RPR0PCRim»P 
OAOOPSA O.HaJMOaASTEROPft»l<NWOaMe.CQMPlE1V COS. WISP 
DROOPSB* O.MaAN0aASTEnCP8MOM.SXCNt.40CSP 
OROPER D.MR INOftAfTER PBSOD (PP^ OME. COMPLETE COS. TUTSP 
DROfRO DMBJINOaA8TCnPAnaO«8(PRO»ENCOOi«IASEOMMTATK3NPnOim 
DR0RA81 OAIBJINOaMTERCWtOUOSCME3LOCUSISOfiAAS1O0S.OOHn.ETE.11O2aP 
ORQRASa D MB ONnriAlTCTCMOIOSCMEl LOCUS SNORASt QMS. tXONlSMSP 
0R0nP4» ONOSOPMfcAOMiPOWWDOBOMALPWan»*4»(BP4»fcCOMI^CTB11>OSP 
OROSRvai 0A0SOPHLAMELAN0aAflmS8««MPrrV{ftRV>L0CUSCNAaEOUBCE, 
OROreOlO OMELWOaASTCTT HCI P O MyO SW OME 1. »0»qRM >A. EXON 10A. MSP 

ORcnnois oJMaANoaASTaTROPowyosMMORx«iiraME,cxoNa.ioeap 

OROmOPU O.MaANOaASTlRTnOPOUVO6WiaME.GX0NSI4,COHPLETlC0S.MnBP 
OROrmVA DMELANOOAlTei ALPHA QMS MOOOMOTRyPSSRKSMrMI.INSP 
DROWL OMaANOOASTERWWTE LOCUS OMAMCOOSIOAPOTMnALISiaRMr 
OROVPIt 0iAANOaAtTSIVPtAN0VPIOMCS.M00OS«lvaU(PROrEMlAN0t. 
ORRADHO OROSOPMLAORMA ALCOHOL OMVOROOMASfiOM AND AAfS»«0NE0Oa, 
HCECS tMm4(H.CCCR0P«A»CfiCROPMSHmA(iMlMPnOTEMP».saP 
LEDHHm I ffWMMW> MMOR 6tRJHCTW»Ml WWBRORaUTl HaUCTAAE-THYIWWLATE 
LCrtOHm L.TWOPCA WMCnOHtL THVMPVUTl SYWnWSB DilVO R OPOLATl IMISP 
LPOCRPS M0RSGSH0iCftAS(LPaVPHaiUS)CnPOMESA.MC00MaO«fiACTWf 
LPOCRPC H0RStSH0tCRAS(L>0LVPt«MUStCflPOMSt.1.MCOOtMCAEACT1H 
OPAACTM OIXVTTSCHAPMLAXHACR0NUCLSAR0NAPRAa.WACTWafiNE.1SnSP 
ONOCtAtI OKVTRCHA NOVA cHirPOn«CH0USCIJATE>MACReNUCLEAMes OME rWP 
0NOCM3I OKVTRBHA NOVA (HrPOn«6H0USCIJAT¥>MACR0NUCLfiARCt OME TnSP 
ONOCSU 0KVTnCIMN0VAtHVPOTnCH0USCUATDUCRCNUCUARaNA.Ctae« 
PPA1MM KASMOOMMPNjCIPMMI (PISS) MDMnniMOR SURFACE ANTI^ 
PPAAOilCA FLWUOOKMPALCPANUUCXPORTB>ANTiaMaO<AOS.I,MflHA,OOHPlETS 
PFACAA PLASMOOMIPALCIPAmMCnCUMBPOROSOrTI'PROTEMRaATraANTIOM 
PPACS PLABMOOIUMPALC»ARUUC«aHSPORaeOfTS(CnPnOTVNaEN8.t»3BP 
PFMP1M PLA8M00KMM£rAmM0Pl»M(«MMCO0MaANnQetCAU.V0fVCflSt 
PPAMtPAl PJMC9ARUMMtTVS«4«CHPR0rTEMOMS.EX0Nt.CU0NEPfmP-N. 
PFAP1M PLAaMODUUPMXrARUMaSCUTCKnPIMODCPORHEROSOITE SURFACE 
PfASAiT PLASMOOM* fti»MM flSCLATI PM?) WNnOMOME. 373a«P 
PFA8A7 FU80M0OIUMFALC»AftUM{SCUllNFnSANn0MOMfi.CGMPLETIC0S. 
PFASHAAPR WjaM00MaFM.C>WUMMHMAP0RWMLLMam0S < ALI>N»<ISCHPROTlW 
noes PjmOWLCSISPOROZOrTl SURFACE «aiQM(CSPnorTWQ OME AND fUMtS. 

I PlASM n C1imtMOWLCBIC«CUM 6P OROeO>TEANnOM(C»)OPg.COW.gTE 
PLASMOOU* LOPHURAS MSmONS-RBM PROTEM OME. eCMfUTE COS. 
P.PVRN.« (RRm.V)lUCnRASfi OME, OOihnETE COS. t3S7SP 
NJttUOOtM VOajl CACUMSPOROOOai ICS) PROFTBN OME. COMPLETE COS. 
SMtfflM AmMU S«LtM OME FOR aONOATKN FACTOR 0-1 VtLmuEXCNS. 

ttmetm sRscswai»(AHTMU)aoNOAT)0N factor cf-i^alphai£ma. compute. 

SUmJSA ST>L0NVCHMLaSMSALPHA>TUBULM(IMS.OONVLSrBC0S.taa«P 

SPOCVQ fUBHn.YaMEP0RCM(OAPRaTBLSM3OP 

SPCLCC aESHaY(SAR C OPMWPCTMRP«»LBCTW«ALPHASU»UMT>MRNA. 

SUPMtSAI SEA URC>M(PJ«JAf«S)UTVMSrai«IBA-1LATlMnHA, COMPLETE COS. 

BUPMStAS SCAURC»m(PiajAf«S)UTlMSTaCWMUTVHRWLCOWLfiTEC0S. 

SUM8M3 SEA UnCMNlPMJWS) LATE MSTQNIIftMHmA.OOMaET« COS. 

SUPMSttt SSAURCMN(PAajAMM)6P0«IMSTONEW»-tHlWL COMMUTE COS. 

SUPMSSt) SEAURC»W(PABJWS)UTlMSTCNBieMMRNA.4SlSP 

SUPH6N SEAURCHM(PiaiAnS)IARLYHnaMBANDPAmWLrSPACfiK»1SP 

SUPMSH SSAURCHMCPJMEJM«)H«rONi(KDOB«.COIIPlSnCOS.4t«P 

SUPMSPt SEAURCHH(PJ«JM«S)KUOMi.C0MPUT1C0S.7«CSP 

8UPWSPA SEAUnC*«(PA«JAnS)MSTONEOOMPUX:(r^.OBttSM.MM. 

SUBAC1S1 SSAURCIW(SJ1WNCtaCANUB)ACTMOBS.CLmBSFA-1S.C0MniniC0S. 

SUSiClU SEA URCMH(SFR«CaCANUB)ACTMO0«.CLCi«BFA-tl^ COMPUTE COS. 

SUBMTM SEAURCHM(S.PURPURATUB)ACTM0IM.aQMPSPOt7.rMSr 

SUSSND •SAURCtWfSWRPURATVStS'CMUmULOOWLSTSeOS.inaSP 

SUBM8EB4 BSA URCMN(SMRPunATUS) EMV HSTCMSOMCS; KtA. MttP 

SUSISTA SfiAUnCNNISrALLOrM0NBN(WrA}MflNA.S««P 

6UTM8CS SCAURC»W«.PCTUS)fiAALV-STAaE»«T0MK)0O«.10MSP 

BUVH8LM SfiAURCM4MCTUS)UrE«rAOSMSTONEK»ANDHlOMCS.StaOSP 

SUVMSLM SCAURC»mCL.PCTU«)UTMirAOEMST0WKI«eHIOMa.SQSMP 

TETACTA T.TI«M0PHUIMCRCNUCLEARACTNOB«.CCUfUTBCOS.14SliP 

TSTW) T.1WMOPMUMSrrQNiHMae«MCRJIM(MOHBQI0NS.t1S«P 

TETHSHt T.THEFMOPMJl WTQNS HI OMI, COMPLETE COS. 11MSP 

rmiOR TWVPW40SCMASRUCEHWWA FOR TOUCTOSS SWIOSPtlATl ALDOLASE. I»<SSP 

TTttCWttO THWANOSOMA SRUCC 0 WMKN SE CAUWOWJ* OMSS. TAWDOR.V WPKATl D. 

T I MP OWC T1IVPm0S0liASRUCBaMBSSANDCF0RPH0SPH0aLVCtRAnK»M*S(PaK) 

ravTusM rMucefs«owMSEALP*MicoHfurE)is«sfrAnauLM(ia«S3t««sp 

TmSITM T.SRUCeWVttWr SURFACE OLVOOPR0T»1 IT 00tPLBnMmA.I7iaSP 
TPBMOIIZ TJRUCB VARWWT SURFACl OL WA JPROT EW ANTAT 1.1. EXP REBSW NLSPtg) 
Timsau TBRUCCI(I.TAT I JtVARUHT SURFACE aLVO0PRCrTB*MnNA.t«MSP 

TRcm rcRunmoasLMcsp 

UCAIfl URCCMBCAUPOFItSMaL08S«MnWL00MPlCTIC0S.71«P 
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in AROMATIC 



APLJMM iW.YmCAt»0«SS C A»i0UJUSC»_R<»mJtrEDWIOMRN<LO0MPUTlCOS. 



cacoLto cAAawi#aM[TooeoouAaMt(oat)OMi.oaHFUTicos.iMP 

CELOOLSO CnESWyOWTOIICCOaAOMSfOOt-g OME. COMPUTE cos. 1TWSP 
CELMPttC C.anM«HEArSHOCKPR0rTmOMSfHSPl*4AA»CHSPl«-lV441JSP 
CaMSPIfO CELiOMSMArSHOOKPROrEMOMBIMIAMIIS^COMPLETECOS. 

caMSPIO CnjaMS»AR.SROTeL MAJOR SPWM PROT» MRNA.O0lgLEr 

CELHSPaU O.BJ0ANSWIliW S I U .ISU0RSP«MPROIPNOME.00MPLETI 
C&HVUC CUaMNSMAWRUraSM HEAVY CHAM aa2VIHUNC44tOMB.« 

caviTs c.QjaANSwrai. oQ —is«vpt70A)OME.r — 




MTAPSR AVMASAT1VAHf«MP0RPHYTOCHRaHB(Am^> 
MTAMR AWW SATWA MW4A FOR PWTOCMROME (A^ A TBMP 
ATHADH A.TMALWNA ALCOHOL OSOVDROOQ WSE flPW. COS>U« COS. ItWSP 
SLYAUl SAmVAUURW OM E . COMPUTE COS. imSP 

aYAWTAA «AHLEYJ>LPIIAA>mJWBTVPEA»0atVMiMISSkOCMWJTECOS.1SSNP 
BLYAAffyWO BAfCSVALPHAAMVLASE TYPE BISOnMEMHIULOaMfTE COS. aCMP 
BLVAAIVO BMUV (TYPE At Hm PI aOOVMS OP MJMVAimASEOM. COMPUTE 
SLVHOnS SAfUY SI NORDOiaeC COMPUTE COS. tMCSP 
BLYPAPt SAISJY(KYttflARE>PRO»ISUNa n AS E iPRa TEASE SiMI U R l PAP (| WP 
~ RAPCSaDCBJ«APUS)NAPWMnNA,CaWUTECOS.nSP 



CLMCVCA CLMICVCLMAMNA,C0MPLETEC0S.>1«3SP 



ctcma cwALospoiwM A cn MONw ome for ■op emcu mn svwthetjse 

CMCOHA CMSVORUSOWKSbSSOONCANAYALM A MMA. COMPUTE COS. wmP 
CPAPAP CARCA PAPAYA PAPMN URNA, COMflETS COS. tlKBP 
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CnCTWA CMMmX9UCNA8nCMUm»CT-A.lTUBat4 0e«.COIAmC^^ 
CACTWa CKAMVDCilONMnCI*MIOKTMTVKJLNOM.COI«1^ 

D9LCC9 OJIfVOIIUttaOlBCTWtUWNnBIWOIimA.CCMPLmeM.100W 

Fsocur fAauMPsi(nMau«)CUTVMBf MnNA.caim.mcot.inftp 

UWBtt Lg«MOMACM.0R0WVlIAaAKimarTlNOB«.CCMfUTiCCt.m3» 
IMATVIA IIU00nnWOI0tUtT9-10O«9C00MaB.0NaAT1CNMCT0A1-«LFHA 

loucTia uMZEiCTWiaM#uet),ooyruTVCDt.att«BP 

ICEAOHIF UMZSilLOOH0LDemMOOeuSi(M)H1^Wm^O0HPUTSCCt.tM0OF 
lOCAOHIt MMKilLOOHa()CHVimOQetM(A0H1^OeS.O0ktfUTIC0t.MMt» 
MZEAOKW ZU MAYS AOm-N OWE FOR M€OHaOem)nOOB4A8CL3»36aP 
MZCAOHM UMZiADH>-NMnMPDnM.CO»taDCHVDnOOCNAttt.lM0ar 
UAin ATffTAOP TMMLOCATOniumL IISMP 

UAOt BOOtPCRMOLVraM-t PnOTCM MRNA, COMPLCTi COS. tlTBP 

uAta (ZCA HAW) mrroM to oec (Mscn, couacrt cot. looi tp 

44ZiK)C4 MASS (ZEA MAYS) MfT0NEKiaeS(H3C4)LCaUPLCTC cot. 12M8P 
MZBWCU UAaiaCAUAVt)H«TCNiH«Oa«(WCU).COMR.CTiCDe,1i36tP 
MZEH4C7 UAOC (ZCA MAYS) MSTQNE H4 Q0C (MCA COUPLETS COB. tHBP 

MZESUSYM UAtzfitucrioeitwmmaBC(e»«uMt0t,coMPLCTf cot.MaatP 
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Bf-t.2 



UZCZSltA UAIZSl*KDZEMPnorTnNaMANORAMCS.C 
UZCZEZ2A UAnnK0(IMhMJ>KDtZS«mareN1.MnNA.«7BP 
MZCZC22t MASSnP(0#mMt.MK0)ZSNPK>TEM3.wnNAnitP 
UZGZEA20U UAOSimynNAfCUmiAIO.VIBP 
lOCTEAMM UASE2EHHnM(CUMEA3et.7M8P 
MZeZEPCMI IMIZSZEMHCAWCHmMnNA(aONiPCM1>.M3tP 
MZEZEZAO l4AIZCZEMO0lStCLONiZ4).13MaP 

NCUMM N.CIW8AAM(NA0r«PGCtFCaLUrNMT«0eM>MXie«ABC)OO««RAMtt. 
NEUATPAtS mCWttA WJttM* iranWg ATyABt gag, COWKBTt cot. MMOP 
NEUATPC ItEUROtPOfUCftttaAMTOCHOMMtN. ADPr ATP CARMBl IMTtP 
NEUATPPW NBIf«0tPOfUCIUStAKA8MAyEUBIUNiH*W^TPAtS(ia4C.Cai»LniC0t. 
NEUATPPAO MCMStAUmiCiaCMATPtmTtMtfiPflCrrtOLIPDtiJtlMTimA 
NEUHMI NEUnOtPORACfWaSAHtTONiKIQeCmtP 
NEUHW «unOtPOnACMMA»aTONiHtO0C.7Ma^ 
NCUMETC MCfUtSA OOPPCnyCrMlOTMONOIOnB. COUPLETS cot. ttOMP 

NEUPvm N.ciMsaAniwoetteoxMaonaTiotcr'POCCAfsoxvLAsc.inaBP 

NEUMi NJCMBtAOM OB* e«OOt«»»W«VDACtHMMKrS OBIVmATAtfi. aSMBP 
NEUTUM M.CWStA monUA. mjTmT) tETA-TWJUM OCNE. MMtP 

NEUUCVCA KCIUSSAWrOC>0«RKfO««JLfUn6UBtMTOFUtiaUNacrTO 
PEAMN1 PfiA(P.tATMJi«M«MNt<PAt)Oe«.MttP 
PCAMNIM PCA(P4ATWWI>li«UMM1(PA1)URNA,CCUPLiTEC0t.<0ltP 
PEAMHI PCA(P.SATMJI«MMMNt(PM)IMNLC0UfUTIC0t.M7tP 
PCACAtW PEAtPJATWlMOPli W fl BCOOWQUAWRUOHT-HAHVBTHOntiaP 
PEALCOA PtA (P.8ATNVM I ,) LEOA SNOQOtM ItfllMW. CCMPUT1 COt. tSIOtP 

pEiMJWt PtAnBctaMfOMmaatfiMPHOtPwncM»oo(VLAti«uAa»UKJNrr. 

PGCVt P.C»«V800B4UyimaOCBC00MQa0PEMCtlJNNtYMTMErA8fi.«MttP 
PCrCAtll PCTUMAQ0«PQnCHXMOPHnLA«t»mMPnorrEMCMU.1O1tOP 

PETCABa. PEnNAd0CPonciccncpHvu.A«OMOMOPfiorrcMcwnLna7tf 
pfiTCAA22ft prriMM oec FOACHLOMPHniM ••«•«» pnorratcM tin. 1IITBP 
pETDttn pcTtmooKRMCtconopHni.MiMDmpnorcMCMxi.nMtP 

PCTCMVIfl PCTUMOaCFOnC»tX)nC9P»niLA«IMlMaPnOTGNCM«ini173tP 
PHOCM. PAmLEV(Pi«flTB«OC>W£OMEtVMrHMCa0«.URNA.M3taP 
PHVCHJ P.WLaAmCI«T*MtlUnN^O0UUTCC0t.111SP 

pMvoECA p.vuxMWPWTOHaiMMLinimocNcecooMaBirnmiaaLi^ 

PHVCLSCt P.VULOAWt PWrTOmMAQOtUT— « OB* BCOOMO lEUCOWOOtVTMATMO 
PHVUA PI«SCaUtVUAtf«t(iaCMEYiCWtLE(UeiOOL0tMOe«.C0UPlETECOt 
PHVLGCT P.MJLOMM LCCTW OB*. CCMPLCTI COt. t«MOP 
POT7AT POTATO PATATW WMA. 1431tP 

POTPMtl pmATO(tOLMMITll»eiOtUM)UflNAtPonPnOTCMASCtMrronl. 
RCCPPffCA CASTOftBCAMMMMPOnnCMPMCUfVOatlMaP 

ROcncM RRMutoouMUMtoBcrenncMPnccunton.e*4tp 

SCOiat SCWOPHnLUMCCTMWilO»aB<WVCtVB>MFWUffwa.C0miTlC0t. 

upR» wwrrc t Mmo u <tcp« pwatwbs) fgwaoxM pwcftjwtow tmtK lowtp 
Mppcv tfca«pf»CTPW(WHw a >MPwiitPLA t To c <AWMP«ieajwtowypMA.7ntP 

BLUCM> O-OaOOWEW CTCUC NUaEOmOE P HU tPI MJ* ! l UIAIi MWML COMPLCTC 
eLMCPnOBA OCTVOtTBJUMOMOCIOEUUimMmnCWTOCPAOTOUtSllttaP 
•LMCYBPftO SLMEMOtOtDiMOCIOCUttCVVTCMEPfWrrBNMIIMMLCOUKETSCDt. 
8UN>1I SLMEU0U)|D.0«COnEUMPf»TiU(DtiaM.COUPLCnC0t.SlfmP 



SGWQLVAIA tOV«EANaLVCMMtUIUmA-1AAWtXPMCIIM0flMmA.C0MPLETIC0t. 

BoraLVAM to«wGLvct«iAtAi«nauNfrMfWL COMPLCTC cot. 1M*r 
tonra.vn tovifiANaLvcMMnjnMrA-»«-tAunNA.i«DitP 

aOVMSPm SOnMMlOLVCtCliAIQI^UMtCATtHOCKPMFfnNOMtaiMtPtTMI. 

tovHtmn tovK«tK)iwt«uA)QLCwMiir»«ArtHoo(pnoTCMoes(aw«pir.tij. 

aOVHtPt WnKMHGAr-BHOOKOMHtMTl.tlMP 

tOVWQU tOytEAM (OtlCME MWQ WAT tHOCK PWOTBH {OUiaPlT.Ml OCNE. l«a»P 
tOOtCX OOnEMLEOHBKMLOtM&tOBCtltttP 
tOVLMI tOV OD IW LEOtmoOL O Wl O O ■ I (LtOt. 1 WtP 
tOVLCA tOirtCilNLGClMMt>0e«M0rUMCt.t1lttP 



tomooaoi tonncmi«]OUJMtiWMBcooMaA«jau«roFuf«CAtci. tiMP 
tcwwpi ftov w* w M PWPiopap«ooMOAPfWji c itCMpnoroi.cotPLcrecot. 

TMTHAUI TiWtCUJPnCPtWrnWUIMTVUMVWSXPMBSCOMLOOUnii^ 

TotATm wecrrmNA maaQMrouA atfi-i opk bccomo bet* tti»uNff of 
TotpRpR mcojvm rmt em t m m poh pathoo chcm wl o t to (PI» pwptcm it 
Toompco ToaAooo#iTAMcit%ttO04cr 



TOUnCtA TOMATO |LGtCa»mjl4nM.0tS>1jM«PH0tPIMT11i 
TOMCtt TOMATO ILBtCaBffm WmOtC-IJ f PWOtPMATC tmmP 
TOMRKtC TO«MTO«-EBCIJLB(rUl|fW(JLOSS-1J«VIIOtPMA1CtOtnP 
TOMRBCSD TOMATO nUtrCAmVLMCIH«ltUMMTLCBt1TURNA.7W 
TOMKtC TQIMTOMJiPCAmoanLMCflMMltUMMITLBttlir 
T OMWPI TaHATOLtt#M 
TCMMWn TOMATOt^BI 

TOMNVH TQMArOlEAFI 

VfMXM VBtAMALCOUMMtOnCLGM, 




WHTOLOt 1WMCAT(TJEtTTWH 0 WW fM WPM OOg. COMHETC COt. » 
WHTOLIA «VHEAT(T.AarTMM)«LP»M>tVPCOUMMO0«.UmA(COmaONC 
WHTQLIAA M»«ATBTOn«aCPflCTONaLMOti«LPMA-tYO0«.afW 
WHTQUMA WHEAT (T'MtTMAtAimA«ETJUIlMflMCLAttA«lffML00MfUT1 
VUKTOLUBB WCATrT'AaTmtt ALPHMCTA OLMOttfOeS. CLONE PWI Sift, OOftPLETI 
WHTOLUK mCATfTACBTMaDAimkMrrAOUAOMCLMtA-VIMMLCOUPlETC 
WHTQLIMO <W«AT(TJtfinMJM)AtWWErAOUADMOP«.CLOW WMIW.COUWJTC 
WHTOLUM WHEAT (TJWBTMt^ALmAiMTAaUADMClAM I OCNE. CUM mrax 
WHTOUMF 1WWT(Ti«VnVU4M^MA^CrA<lLlADMCLMtAIUnNA.Caun^ 
WMQUAM WHEAT (rjtftTNUU)ALPHMETMlLWOMaAatMVMfWI^ COMPLCTC 
WHTQLUIBM m«AT(rAC»TMM)ALmwiEr*OUA0M0LABtMII«MA.C0WLETC 
WHrOUOaA WICATfr.ACtTMM)Q<UM* ni,IWlMCtftitHMBH<LOOM>LETCCOt. 
WHTOLUri WICATOCNCPOnMMWaunBMtUBUNrr.SlMP 



vBtPOXf vcMT (C.TnoPCALtt) pon aom bcocmmo Acn-coeimc a oxidam i 

V8CACT VCAffT (t CEACVttlAC) ACTM OBS, COMPUTC COt. IIOOBP 

VtCAOC ftCCREVttlAC ADEt,? OCME BWOOMO OLVCMEMMDfi RttOTCE Mt ttP 

VSCA06) VSACT (tfREWlA^ ADO OM 0COCMW C-l-TCTIUHVDWmXATC «Ua»P 

VaCA064 VSMT{S£B«V»IAOACS«OEWCO0MaP0ntMiaP 

VBCAOH) t£cncvBucAOMiaMgcaoMo«jOOHaDe<vDnoaeMun,iw«p 
VBCADM vcArr(8.ccncmAQMOOHaLOCKn)*K»0MCitADH)oos.ri«p 

YtCAORI YMT(t£0UMtlAOALCOMXOCHVDnDOOIMEiaB«.AOfO{aiUCOt6 
VtCAPtO VCMT /n owe POn CVronjWMATC A«PMVn(L-T1«M tVNTHCTAtS lAtPflS). 

vacAWU vcAtT(txsf«v«iAQAmaaa«mooOMOoi««rHMCAnM«m.mw 

VtCAKW VCAST(tjCCf«V»UQAnQMMOtLCCMATCLVMi(ARO*»OOCnM8P 
YVCATPAIfT VCAST(t.Ce«WWE)UrTOCHONOMILAT?ABC1ilLPHktUBU«TOOC, 
VtCOMt YEAST (tXetCVBU^CWIOMeiOOOMO AN N«N0ACOPCf«CASG. 
VBOCWtO VSA8TCW1O0S FOR MOtWCPBtKASC. tower 
Y8CCAR YEAST (S.Ce«VISM)AnaNMC(CAAUOB«, COMPLCTC cot. ItlSP 

vtcctPi vcftCT fo ccncvwirtt] fort ocrg twitf 

VSCCap* YCAflT(t.C0«mtAONUafiAnaBSC«P«POftCYTQCMaNSt.CCMFL£TS 

YBccocao YEAST cat omtKMCYcu owe coca UMtr 

VSCCOCa S.CERCV«IACCaj.OfV«IONCONmCLOeCCOCM,CCI#LCTCC06. miBP 
YtCCOCJI YEAST OCOat one flCai0NP0ftA0emATCCYCLAtCCATM.VT1CCe<TTIG. 
Y8CCOC7 YCA8T{8£CflCV«lAE)CSaCVClCOe«fCOCn,COMPLCTICOt.noaBP 
VBCCOCA YSAST(tjCCRCMBIAE)CDCaaMfNWLVB>NDNAnEPUCAT)0>«. 
VSCCOC* YEAST COCtOeS FOR tMALIQA8C.aMI«P 

YBCCOa YSAST(8£CRCVBlAC)CHnOUOtOUEMC0miOUm|C0O)fleOJCM.4mP 
VBCCrrv YCMT(t.CB«VWAOCnioe«BC0OMaTt«CYT0PLA8MCF)0RM1tMW> 
YtCCUDl YEAST ftCEREVWIAE) CW>1 00C BCOOMO CALUOOULN. COMPLETE COS, 
YBCCOU YCAST tt.CCnCVWM) CYTOCKROM 0MVA8C SUMMT ft (COMft) OENE, 
YtCCOU YCAST |8£SnCVBlAE) COM OM BCOOtKI CVTOCMIOUC C OXtOASC I lOMP 
YBCCFAt YCWr(t.CCRCmMOCPAI one BCOOMO TIC CMMIVLPHOtPHATC 
VBCCPAl YCAST rSCeCVClAOCPAt OENE OOOMOP0nT>CARa»»S«VC*C 
Y8CCPAX tjCCflfiVBlACCPAIOBC BCOOMO CAMMOn.-PH0tPHATi»VHT>CTASG 
VBCCS YCASr(S.CCflE\«MC)CrT1UTCtVNTHAtCae«.COim^COt.M27BP 
YBCCUPl YCAST (tjCBtCMUAOCUriLOCUt BCOOMO COPP»tC«>TWAM> Ml 
YSCCVC YEAST ACeCMSME) NUCLEAR OBC FOR CYTOCHROUe CI PRCCURtOR 
YSCCYC1T YCAST<t CCnEVBIAOir«OSUWMT0FUBnUM0L'CYT0CHR0UECT«7BP 
YBCCYCIX YCASr(S.CCI«VltMCtltO1-Cyr0CmCUEC{CYCI)0B4C«W 
YtCCYU YEAST{SCCREV«lAOCVTOCWKnCCO0aOA8StUMMTIVaeC,1ftftttP 
VBCCYC7 YCAST (S.C&CmaAE)nO«-CTTOCHRONCC(CYC7)OBC.COUPLCTC 

YtocYcn YCAST <8> CMvwAC) uciouMa<rroc»««OftC c nrauCTASS h kd iomp 

T80CYR1 YCAST (SACREVBIAC) AOBfVLATC CVCLMC O0C (CVR1). COmfTC COS. 
VtCCYlU YCAST OeC FOR UirOCHGN0RULU*KACTATCCYT0CHR0UiCtU2tP 
VtCCf 1A YCAST (t. CBCmiAB TVt OBC FOR aCN(UT10N FACTOR EF-t M.PHA. 
YBCCFIAA YSMnr(S.CCRCVBtAC)0OC FOR aONQATCN FACTOR t-MMA, COMPLCTC. 
YtCSFlM YEAtrr(S.CCnC>mM}^l-ALPM-*(aONaAT1QN FACTOR t-MMA) OCNE 
YtCe«»A YEMTl8.C€nEVISWC)ENaLASfiaBC(OLONEP0OM)ANORJWM. 
YICBCt YEAST(8,ceCVISM)CN0UIMaBC|CL0*CPQ«lt)MDflA*M.1M7tP 
YBCCUU YCAST(»£e«VSlAE)0AUOGNfi00OMaP0RAPOtmvCREaULATOftOF 
VBC0M.7 YCAST (8.CB«vmAC}aM.70B«S BCOOMO MMtP 
YSCONJOO YCAST(SjCEnEVWAQaMJ0flEOULAT0RYOe«.COUPLCTEC0t.M87aP 
V8C0CN* YEAST <8CCnEV«UE) OCNI O0C. IftMBT 

Y SC O CN C YCASTtSCBCVCMOOaCRALCOHTTMLREOULATORVOeCOCNIWO 
YtCQOm YCAST (t.CBtCVBlAS)Q0H1 oec BCOOMO NAOPHOVeCGMrOLUTMUTE 
YSCQOHN YEAST (S.CCTEVaiAE) ODHI OBC BCOOMO NWPII OCPOCOff OLtfTWATE 
YtCHttt YCAtT{8«eKVIStAOH«TONBHS-10eC007tP 
YSCM2tt YEAtT(8,CEICVnUC)H»n»fi WMOeCiMSf 
Y9CM»«t VGAST(t.CC«WIAE)MST0»CCOPV4K)MCHlOBfES. tMOtP 
VBCMMCI YEAST (CCeCMMAOMSTONECOPV-IHS AND HI Oe«t.tl00flP 
VtCM^ YCAST (t.C8CWtM)H»mNCPBWEAtE OBC (HniCOUfLETE cot. 
VBCHt4 YCAST(t.CBC>MtlAOMt4QOC.C0WUTCC0t.47ttSf 
YtC»«A YCASTWAOBCFORMPCOKlMSCAiftAPPM a TOCttOiOtCUCVt. mitP 
YBCWB YEAST OBC FOR HPCOKWOtCtHA f PMO TO CWmCaCUEVCIMTW 
YSC»M>L YCAST {8.CCnKVWAE) HATMO-TYPC LOCUS »«C<ALPHA. SUSP 
YtCHSPW VCAaT<S,Ce«MtMOHtMOOSATtHOO( NOUCaLea0CCOMPlCTE 
YEAST (SjCe«VBME)HrS1 oec BCOOtCCYTOPUSMC WD SaMIP 
YEAST (t£e«VWUO MKI OeC OOOiM FOR HDCOMMSS P4,CCun.CTS 
YtCMOCt YCAST(t£e«VBtAOK«»0eCCOOMaF0R»CX0KMAtiP«.CCMPLCTC 
nCLVS YCASTOeCLVtF0RACET0LACTATESYimWBS(GC4.1AII).3ltaP 
YBOLCUt YE<ST«t.CP O WtW S )maoeCOOOtWFORtET/M> O PW CI P m ML A TC 
nCLElM YEAST(S£EfCWUS}LCU4aeCOOOM(IF0RALPHMtOPR0rnilMATC 
YtCUIPI YCAST It. CCBCWSIAC) Ml WjMUO. m*t PB E PWOTOXM OBC. tOtfltP 

vtcMtprr YCAST It. eetcvmAS)MtpuMM«)PRB*RaTQ)(M OBC. couuTC cot. 

VtCMATA YEAST (8£EnEVBlA0MATM0.TYPfi LOCUS MATA0NA.»M3iP 

YBCMATN. YCWr|t.CEnCmUOUATHa-TYPCL0CUtlyUT-ALn«A,tMMRP 

VtCUat VCMTftJC«CV«IAQMaioeCBC00M0AlPHA^MLACTOtaiASC.tttlBP 

YlCUFAlO VBAST{tjCBCMtlAOPte«QMOI«OBCUF4iMA'I.COUPLCTCCOO. 

YtCUFAM YCAST (tjCBCMtW^PHeCMOfC OBC UFALPM^tOOWLSrS COS. M70P 

Yt cMW * YCAST (t crasvituopaviA^iCMapncncMoec. COMPLCTC cot. 

YtCUBSSt YEAST (S.CaCMSMBIIS88l OBC 00MMTBCn.I2lMP 
VtCIBW YCMT{SCEICWtlAE) MSWOeC BCOOMO ISTOPCICIWL I RIP l U PIIWIYL- 
YtCOOCD YEAST (t.Ce«VltM*0«)UnAI oec COOMOPOROHPDCCAFBCDmASC, 
VICOOCF YCAST ftCBCWHaPlOCmHAl OBC OOOMO FOR OMP OB C A iaO XnASE. 
VSCPMPO YCAtT(SjC0CVCMQPOLTAOemATl«nOI«lPK]rTCM OBC, COMPUTE 
YtCPOCS YEAST (t£B«VCIAE)POCt oec BCOOMO A CJHPPHOtPHOOCSTCnAtC 

rwcptp* YEAiT(tjce«vci«PCMoec BCOOMO ASPARrapnorcMccoMn.cTc 

YKm4M YEACr{».CCHCVItUePET«t« oec COMPLETE cot. IITttP 
VtCPVTt YCAST {t«REvawO PCTI OPC BCOOMO AOR>AT> T TW NBLO C ATOR ITJIBP 
YBCPHOO YEAST{S.CBCVCIAQPH0ftAM>PH0tOa«tOaOMOP0RWI)MiaP 
YBCPM YEAST (S.CeVMSU9Pm(PHOrORSACT1VAT10N OBC BCOOMO trap 
VBCPHRI VEAITPHR1 OBC FOR PHOTOLVABS. SWItP 

VBCPW YEAST (BjCCRCWBUO PC OBC BCOOMO PHOtPHATmLMOtfTOLtimP 

VBCPM* YCAST (tCBCVttUOPROrOIKNASS oec MOBR 

YtCnASM irCAtTtS.Ce«WSMQtMCn(MCS«UFLAflHO,COW£TCa0CMC. 

VBCPORM YSMT0.CBCVCWe FORM HRMIL COMPLCTC cot. MIBP 

YtCPPrn YCAST (t JgCVWAB P I ICWC PATHWAY WCOUJiTORY I (P>Rl> OBC 

VtCPPRt VCAST(aCCBCWtlAi)PPRIOeCMaUJKTWaOtfYDnOOPUTAftBlt31BP 

YtCPfiri VBAST (t£8«V»IAO PRTt (COCO) OBC BCOOMO TRWCUTCN M1tB» 

VBCPUn YCAST(tjCCRBVMAQPirnOeCBCOOMaPBC0e<VllR0OB«AtCISI«P 

VtCRAOf SCBCVWABRADIOeCOOMPLCTCOCOMOtBOUBCE. 811 IBP 

VBCAAOtO YCA«TnM>1tOeCMWaLWDHTICC}C8IONRCPA>tOFDNA.S«BP 

VBCMDM t,CBCMSttCRW2 0eCENCa0M0RM»PnOTEN,00MPLCnC0t.MIIOP 

YtCRAOa YCASTttjCCICWBIAqRAOB OBC. COMPLCTC cot. MtCtP 

YtCRAOSO YEAST (tjCOSMtlAQRAOa OBC COMPLCTC cot. XantP 

VtCAUMt YGMrT0.CBCVnM)nMMK OBC BCOOMO MUPKBinFICDPROTCM 

VtCRAOft VEMTf8£8tSMBlABRAMaBCC0MPlCTCC0t.t4K8P 

YtCAASi BjccncvnucMi oec COMPLETE cooMaBcauecc.coar 



Wmw WICATH 

wKTHioti WHEAT MrrvNcmTHMi oec coMLcrc cot. 7r7tP 

YCAtTCTROFCAUB) P0X4 OBC ENOOOBC ACYL<oeirVMC A OSaOAtC N 



YtCAAtHtR YCAST (S.CPCVniAC)RA»HBajATB) OBC CRA* to l.1I2»tP 
YBCAASMR WAST (B. C0CVN1AO AAtM RSUTO OBC &«AMC-S. 18UBP 
YBCnPt* YCMTfB.CCNBW)AQRCaB0HM.PRinCMtf.0CHn.CrCC0t.f7naP 
YSCRPtIA VCASTCCSKVItUONBOtOMALPnorrB«fttAOeC(nP»1A».inotP 
YtCnPfttt YCAST |8.Ce«WBUC)n»0B0MAlPnomNfttB OBC (HP»1B>. l4MtP 
VBCAPVI7A YCAST (t. CPCVWAO WOtOMW. PRCrTgH LI7A OBC. COMaCTC COt. 
VBCRPU* YCAST (t CCnCVQUOMOtOMALPROTCN US oec COMPLCTC cot. 
VOCAPUt YEAST ffjCEHCVWAQ n COBOMAL PWOreNlia.OeC CYW. 13MBP 
YtCRPtW YCASr<8jCCnCVClAQM0t0M«.PRCrTEMU«OeCCCMPLETCC0t. 
YBCRPOn YEWT(8.Ce«WWOflPOt10eCBCOOtaR»MPCL1MCnMCILWIOE 
VBCRPOBI YCAST CB.CB«VIBM)IVOBI OBC BCOOMO MA POLYUCIMSIfflftMttP 
YBCRPBM YCAST (B CBKVISMDntOtOMALPnCrTBNtM OBC COMPLCTC cot. 

vBcnms YKAsrceecvmAORBoooMALPRoroiaaaoecsncp 

VICtCCM YEAST BBCM oec HBOUWB) FOB PWPTCWA M BMLV lltTBP 
YtCtSaO YCAST tt.CCRCV»IAQBSO(S«.a(rMR0mMT10NBGOULATO^ OBC 
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VSCWOO YiM;T(tjC£nEVni«Q8ia(n0irP«MMTKMKaUUTOf%O0C, 
VSCSNFt VIMT4t.ce«VMMC)aNnOM»C0OiWACMMNCATilMUTIIHW 

VBCVTC80 vsACT<t.cc«v«M)CttiTvpfi«pccmDmnu«moe«.»Mr 
v«c»TS3 vGMT«i.c0«viu^iTnoB«ecoi>MonGcvi«npmfmouc»cA 

vaCSTEM VCMT0.CCI«V»M3CaiTVK«raCimn0UITGIOOS,lMNP 
YBCflTCT VEMT (i COKWOTtf) HKOUATOWV OWE fTP.COMftCTI CO. «OC«P 
YSCSUCt VEAIT(tjCa«VttlAC)SUClOeiimOOOMawlNW0T«M.OOyPlfTlCO*. 

vscnccxt vsmcoc«ae«DioooMTHVMCuxTiKMAss<nQ.«>iir 



ITTBAIfA T.IHIiPOfW.M«MPIQ0«moaOMAtMiiaittlMOLUMJ47NP 

cMn. ooinnjwwTiwMPairnM)orao«.TnM(nJNiaB«oaunm 

» CSICM FU6MO00UlCALaD«C00Mf%nTWIMMTVPn0TBITO00UCM 
csicSA H, mm cctii cs* gr bo. ■*! JIWD tY» OPW coomplcti ci»>. mcp 

CCICO PUtfMeOOUI.OCMPLCTSOOMHI.ttMaP 
CClCai PUIMB CGUI COLCM fit OB*. ttOMf 

Rjww oa^rn MMNffv raorm. comm COIL «ND ooujcM Q 
c. cou vrm oec mow mjmm coion. M«t» 



V8CTRPI VSMT(i.CB«V»»E)M/TQNaMOUSnE(UCAT10N«80UiNCIMI«M 

vaCTTVO VCAtT(i.CB«V«WE)TTV3<»SCOOMOP0ftMT>«AMAAnfYNn^ 
VBCnJM VIA»T{t.CB«WAE)KTA-TmLMa0S.IMaW 
Y8CTV117 VEAflrrCLNniTIM«P0eCNTVM7tM2BP 

vttcuAu veAftT0jce«visufi)unMoeitecooMao(vrw«4r-moePH*TV 
V8CVP30NC vfiAiT(«jCB«vwAqvnfMnMpnrTDOCOoes<HuuwC4wa^ 

VSOffTAI YCm (t.OIMTATICUt» E)aTMCBilJUIta.UCCMmA8C OM STAt. VOBP 
VSeOLUQ YtMT <C#B I IClHOtAl ICT^OtUCaaiOAifi oe«. COWPtCT CW. «17»y 
mUCT VEASrtSjCAflLaaSIQOMM) ACTMOOK. t*HV 

vsooM^ VBW(t.CAAae«aBm>(WLt»o«.iRcoiONvmM«omMfMASTMrn. 



Y8QIWLMT VMT(l.CAIUftB«>B«»)IMMWOU<l«rQOOBICOONaiMLTAS& 

VtAtT<aXAflL«0«»e«8»Mai(«J»»4MIJCTC^^ 

{•.CAIlJBO»0«*lft^"" 



vsKxica 

VtKUICt 



WCOC10 
V8PC0O 
VSPCVC 
VSPMSM 



YSPRAS 
VSMASX 



VCACr (ft. CAfajamMtt) fMOMIM. PnOTEN LM 
PUtfUO Kt R«M Ktira YCA8T LACrm COMfUTE 
VEWr «UJWn} LM OM BCCXma POSmvC RCOULATOn OF TIC 
HMWewUfOLWCIVHA OM OBC POR 0*HVmaO(V.ACCroNE tYNTM^ 
8C»aO«AOCHAROWVCC» KABS CCC 10 START OE>C. COMR^ CU. IMttP 
VCWT0#OIM)CaiOMnONO0«tCOCa).OOyPLCTSCOt. IWV 
VEAIT (t. POMQ CYTOCHROWf C OBIS AND RJWM. AMtP 
YEAST (tJQMK) HVTONi KtA«CTA OBC COWICTS CO*. MIBT 
VSMT CSKMQ liVTCW HtAAimA ilM> WfrALMA 0OIG». COm^ 
YEAST (MCUK) M» PHOWHMTASfi (PHOl) OatC, OaWLKTl COS. ItMOP 
YEAST ftMIAQ MS Oe«. COUPUn COS. tnip 
YEAST rSCMBOKAOCHMOMYCCf KASE) HAS OM. ICMV 
VEMTtSJOa^MMIOMeCOOMOALPHATViULM I.CCHPlfiTBCOS. 
YCASr(SK)MC|0MBCO0MaALmA-TU»ULNtCaMnETECn. l79Mf 
8.P0MS NDM OM SCOOMO BETATWJLMC COMPICTE COS. tO«a» 



iWMPCT* 
ANARUBP 
MATRXA 



ANAtJI0aOUMaMDCOOt»POAOil/rN<MtSYNTICTASt.COWn.CTE 
CVMmACT. MWAOM TtSO MPM(NrT1W0eMSS (»UCTASEt OM tHISP 



AHWA0M 7110 Ma » mCS 0B« MOOMO SOTH IWCAMOXVUSI 
WMABM sr. TRXA OM BCOOMO TMOneXWH OOWPUTE COS. MaOP 
MMCVmS MOUUHS PHOSPHOetCLPYItWATECARMatVUSS (PTC) OM. 



accpfCM 

BCCPPCSH 
SCCSAS^A 
BCGBAS" 
DCIAPH 



««iCVST«l«OIAi»«aoiWSrCAMOXVlASSUI«SSUSUWrOMUtNP 
A. NDULM RmL0S8-l>«SPH0SmTE CAmOXVUSE OXVOBMSS SMIU 

A CUAC n UPtr ATW &W»COCYJW M (C-PC> <LPHA INO SSTA SUOIMT 
AaUB«U.UM QUAOnuPUCATUi nnOOCYMSN MMA AND KTA SUMMT 1 0ld 

NtfVUIXUFACCNS AUMJC PnOTTiASi (APfQ OM OOHPUTE COS. 
S. AftniOUOUVAClM ICUnW. PnOTEASC #Vf« OM OaM>lCTE COS. 
IJWVUUOUMCM SACOQM MOOSn A WOUATOIir nCTOMMOf 

s. MivicuouemM suofTUSM OM eoupuni oooMo uoutMcs. 

BJC8UUS BETAUCTAtMSS ■ {BUS OM OOltfUTB COS. 1 1 S«P 
S£Ef«IB (fTKMN M) PMC OM MOOSia •CTAUCTMAtl I. UOCtP 
BjCBVUB (SmAMHMQ PB«>C OM MOOSIOSCTAiACrAiMSS I 
S£mUI SASP-lOMENOOOSOASUMLACeSOLUBUSPOflSPfCrTBH 
SjCBVUS SASM OM MOOSO ASMMl ACOOOLUSLE SPOm PnOTEM. 



BllCHM«miS PM^UMMI OM PB#. r BtO. AND PeSC&UNASS 
SlJC»»*OfMO PM IKTAlACTMMSfi) OM OOOSIO POfI taOCSP 
M SiJCHMOMSPOOHOM. COMPLETE SEOUME.I&nP 
WClLUSUCtMaOlftSSOMPMSUVnJSMCAnJSEM MOP 
SEtcnOM 
SGQUBKS.M 

OJMOATEIWM SASP«4 OM BIDOOtia ItUlL. ACOoaUMJ SPOnS UOSP 
SJMGOATEfWM SASMX OM BCOOSia BhUO. AODOauOU SPOM 01 IBP 
SJMOATt MM SABP4 OM MOOSIO tllUil. ACO^OLUBLS SPOMi PnOTEN 
BiCOATERMI SASP B OM MOOSn iUNl. AC»S01iaLE SPOnS SMSP 

tMTBWBG»N0O0MSPU«OMC0ft»LSTCC0S.TB4aP 

BMieiOSMMOOOSUSTWOStJWMTPSJIOM COMPUTE COS. tOOaSP 

ToeaNaMCOiffiETECos.«3oiip 

E BCOOSM SUMim SI. Sa. Sa. S4^ SI. 400MP 

BJUucuB aa-m (Cw x iWMPiPBCOL Ac cm Ti u Ni rpw s c om imbp 

BACtlUB PlMaUS XVHA OM COOSO POR XVUMASt. WMP 
•ACUUB BP. CaiUASS OM OOWUTE COS^ aONfi PMO. tOlttP 
BW&LUB BP. COiULASi OM OOHPLETE CDS. CLOM nSCt. lOiMP 



BSTMM. - 

BSTLOH. BJTEAWgn P WOPl CU B ICT OM MOOWO l-UCTATE DPfYP ft OOM Sl . 
BSTIASPA BJmABqn CT llOP f .US SAOP-I OM PCOOBIO A1MI AODjWtlJBU 
BSTTHTtt BJrrSMIOTWfMOPI&USPUSHOPTMniTETIUCYCLMKSSTAMCEOM 
BkSTEAncmCM0PM.UBT1VS OM MCOflOTRVPTQPHMYlTIMA I4MBP 
BJUBTTIJB AHVB«* OM COOBO PM W^TVPE ALPHA4MYiASS. tmSP 
BiSUBnJS MfVUSa OM. OOMPICTB COS. aoosp 
B JUBTLB RiMMWSI OM (Cmi. COMPUn COS. ttMP 
BSISTUi OM OM BCOOBM ONA PmMSC OCMPICTS COS. teOIBP 
BJUmiSBCrAOUJCMMSOM |1-«li1-0«STAOOLUCAN t40MP 
SJUSnUSlOLUeCWATE O P gW OISQWTRflWmjWOWTPO 




BCOS.ta 

BSUSPOA BAOSLUS BUBtUB 0 J W OM 44 IfP 
BSUBSPA BACU.UBSUarTIJSSSPA0M 000BOP0flSII«l.ACI04aUBLi« 

Bi4JBiP* BACUUSSUBmjsssp«OMoaosonRMwu.Aei 

BSUSSPD •ACttUSSUVTIJSSBPOOMCOOSIOPOIIMMX.AC] 
BSUnV BJtJBffia I H >P m PIMIIOWPi U P tlM I.CCMPIJTlC0S.S 



S BMB«n CNYSTAIM I 
» TtKS PlAiMO CMVSTAL 



COMP PlASU|}OGLC»CANCaiCMOOM:f«OtONCaNTABMaoaUCS<ES 
CBMM PUVnCOIAOMSMUNfrrPAOITaii COMPUTE COS AM COUCMO 

CGa.v8 E.coLiiYs»oMncMaAaHM>oaLS».»Kap 

CHTOMPIU OUWYPU TMCHOIMTtt OMPlU OM BCOOPO WJOU OUTCT mO 
CWANM PLUI«COUACASIC0UCMIASTRUCTUM.AM)lBMMTVOMS.97riBP 

cacai PuaMi)couBCCucMaoM.»7iSP 

C— ISil PUaM»O0l»PtCaXNBSTnuCTUW.AMBMWmrOMi^O0MPLETB 
ClAOOU CCUON A PUMHO. CaCM A OM COMPLETE COS. tItSP 
CUCPl E.C0UPU(IPLAttSOCa.Va4(»«(SUMnOUPIA)TnAAOM44lBP 

cLocSdA cuosTRsmM T ign <o co j.ua ca>OM bcoowo booolucanase a natsp 
aocas ciosTnD*uMt>eMOcaj.uiicasaMFCiiB«xxH.ucANAscs.tt«tP 

CL0R» C.PASTEUnwMUMmSKO0miOMCavurEC0S,«0«BP 

avPL S£cu Puts PLAMsoccLwnpfsiBonaup ia) thaa og«. aaibp 

OCfVKOn COmNaACTEfSUHSjMKETO4>OLUC0MKACB(U^KatRCDUCTASi 
OVIMYDQ D.MUM«84»«0AHV0(l00aW8SAWa»MMPnOTEMOBCS.00MP^ 
EAkCPP EfWflMAMmOVORALPOPnOTEMOMrRBP 

ECHA8N. E-CWYSWrnno ASN OM PCOOWO L-ASPAWAONA St . COIW.ETE CCS. 
ECmS CjCHRYtNmeSPECTATl LYASE B(Paa>aM COMPUTE COS. m«p 
eCWEU SO«1YSMm««PCCTATELYM8B(PEU)OMOOMPUTECOS.IOMaP 
ECOACE E.O0UACCE.ACe:.LPO4M> A 0M8 (ACE OPCAOPSOOOSnPOn PYRUVATE 
ECOAOA E.CaJAOAOMCO0NOP0«APAPWgreHBS0UAT0HYPWaT E MOP 
eCOAOAA EO0U(ffTWUNS}A0A0MC00S<0P0nA0AP0LYPROTm(IEaULAT0flV 
GCMOAB EjCOUAU(SOMBCOOf«aT>«AilCBPnorTEHCOUPUTBCOS.ANDAOA 
ECOAOC E.OCUAOKOMBCOOt«aAO0mATEmA88, COMPUTE COS. tOOIBP 
ECONJtS EO0UAlASOMC00t4aP0RALAim.-TnNASVNT>CTASS.ITTOP 
ECOMJtA E-COUMXAOMENCOOOMkMCTMnAOOMONAOLYCOSnASSIl 
EOMCPCFn E,OCUPROCPe«N,FROAANDFRDOOe4E8 0O0S«lP0ARMMUTEM*»P 
GCOAMS E.CCUMaQMCOO»«lP0nAPMOUCTTmTAFPECTSHnNASTMUrY, 
BCQWAaCP S.GaUMUBAOPnOMOrEnRfiOlOW AND AfWCQM Coosa PON ACTtYATOn 
ECOARACB fi£0ULWIMS»«OSSOPEA0NfARABAO|ANDAIUCOMCO0SMP0niStaP 
eCOARACK EjCOU JtfWC OM COOSO POW ACTWATOR AW PEP flCfl B J R PROTONS. 
SCOAAOP fi.CCUARQPOMCO0tlQPCR0PNmWECAB»AMOW.T fU Mft fI RA >E . HOttP 
fiCOAROI E.COUAROIOMBCOOSiaOfMmMTRMBCARWMO'AASCtOOMP 
ECOMOA EjCOII am* OM POR O-SMOLPYRUWYUMMMATE »4>H0SPHATE SYMTHASfi 
BCCMOt i.C0UAA0S0MP0A»OEHYDR00UNATE8YNTHAeS<GC4j».1J). 
BCOAAOP fiOCUAROPOM POR OMP SYNTHASE (TYI^COMnJBTSCOOSaSSOUOCS. 
GCOMOO E.OCUAROaOMOOOSn POR DAMP SYNT>CTASE(PHB«UILWMttO»P 
ECOAROUI EjCOUKItAROLANOAAOMOBSSBIOOOSMSHKBMiTEnNASEIAPBA 
E-COU A80 OM OOOOm POR MPARTie SEMULOeiYOS OOfVDROOCHASE. 
ECCMPAW LOOUWASPAOMCOOtlOPORASPARTASSIl-ASPARTATBIMliP 

ECOASPC fi. caiASPcoM POR ASPARTATE AMscrnuNsnsutiuiaip 

ECOBtAA E.C0UBnAaMC0OMaP0RBSWLABnBCTt0NM.PROmtWTrH 
CCaSTVB EjCOUlTWOMP0RT>«vrrAMNBtlRCCEPraRPRDrGNVnB.m0BP 
eCOSTVCED E.COU BTUCg) QMS BCOOWQ VffAMSI BW TIMNSPOBT hBCI ItNTtW. 
SCQCM. i,COUCAiOMPORLYSSPROTEMDCOOS)ftVPtMISOOOUOU1. 
SCOCARW EjCOU CARA AM) CAM OBCS POR CAMNHOVUPHOBPHATESYimiETASfi. 
ECOOCA E.OOUCCAOMPCOO>OT»<ANUaEOmDYtT W ANS rCT >SE.OCM>UTlCOS. 
GCOCDH fi.O0U COM OMCOOSIO POR COPOnLYCEnOSMYDRQUSfi. COMPUTE COS. 
ECOCOMA ECajPP(A,8SrAM0C0HaeCBBCO0SiaPM08PH0mCTQKitfSfi.1. 
GCOCOS B.C0U CCS OMBC00>WC0P-0<a.YCBWSSYWr>« TME . CO MPUTE COS. 
GCOCMEY ICOLICtCVOMWrTHyMOPCtaANOffMOPCWOEWS. 
BCOCPOB EjCOU CPtS OM BiCOO>«0 PgUPt OB iC r jr<YCUC tlMRP 
EOOCRP &C0UCPPOMC0OM0P0RCYCUCMWWCCPTORPRanGN.tt»BP 
ECOCYA E.C0UCVAOMCO0SIO POR ADBfYlATE CYCLASE AND CYAXOMSOSTBP 
ECOCYAO E£OUAOEMVUTBCVCLASSOP0iaKCVAOM.CaMKCTECOS.MOIBP 
nOCYSB KOaUCV1*0MeC00MaCYSaRG0UAT0RVPROTBiC0M>UTECDS. 
ECOCYTR B. ecu CYTR OM COONO POR CYT REPWEI »;>ft laMBP 
EC004M E£a(OAMOMCOOSiaPOROHAADOMICTHIUSi.1ta«P 
eCOOAPA G cou DM OM ENCOOSa OMYDROOnOOLiMTE SYNTfCTASB 1 1flPBP 
6C00APB EjCOU DAPS OMCOOPSI POR DS«VOROOIP100iMATERauCrASS, WISP 
SCOOAPO fiOCUDAPOOMCOOtaPORTETMMTDROOnOOLBAATEIItaP 
ECOOSOC •.C0UOCOOP»aH,PR0M0nRSAN00SOCOMeaOS«IP0RIB3«P 
GCOOSOR E-OOUOEORaMCOOMOPORTWOEORRVREaSORPROrEtLtinP 
ECOOLD E.COU OLD OMOOOMO POR MACTATSOCHYOROOBMSa. COMPUTE COS. 
GCCaOH BO0LI0lDOMB«O0i«lDiAeTATB0BmiR0OB«ASS. COMPLETE COS. 
eCOCNAAOP EXOUONAAOPetOHcONAAONMi AM) RPHHOSttS COONO POR ONAA 
GCOONAB B£OUDNWOMCOOSiOPORAR0UCA11ONPROrm.1«l1BP 
eoOONNt E.COU OHM OMB«C00SiaT>«ICAT SHOCK 10 PROTmtttBP 
ECOOYB S.COUDVEOMOOOSnP0ROYIPA0rratCOWUTBCOS.t40MP 
EOOCU EXOU PNC CPARTUl COS) AND BUOMS (COMPUTE C0S»BCO0S«> 
CCOPOHF EjCOU POMP OMBttOOiiaTMSaBiOPOLYPVnOC OS tMontP 
600PHUA E,CCUPMiUOMB«OOSIOD«PB««CHnOIS-IIONRGC9Ta^COiM.BTB 
B00FB4A E.C0UfaMOMBI0O0SaTHETYFClFWBRHLSiaiMr.t4HBP 
BCOPaStST C.C0UKSnAFMnSlN(P-fB«RULA#fTOBSOMtMP 
EOOfLAA SjCOURAAOMBittOOSOAPROrmiceNraauwaTMROrATIONM. 
BCOPOU Cj0OUPOUOMCOO8«>P0RDtm)nORQUTIBE0UCTAi8.1l0eBP 
eOOPTSM fi. COU mo AWPTBAOeCS. COMPUTE COSl APS PTUOMfPARTUU. 
EOOfTBQAB SjCCU FTSA FTSZ, iMO BMA OBCS COOMO POR FTSA PROim COl 
SCOnSMZ ICOUFTSaFTtAANDPTlZOBCSBIOOOSMCaLDMSIONPRCnOa 
ECOnSAA EOflUWJm<POM4NAOB«S C OCSiOPORP U aMIA SB AWDM< H IO S BttOIOP 
BOOPUR E.COUm0NREOUATORYOMPUl«0«P 

EOOOALX BjCOUOALTOMOrBCIIAPCaAUaMCOOSaPORaAUCTOSS'l 
BOOOALYS BjOOUOAULLYljLALYtNOBCSOOOSIftPORaAUTMaPSRONMPnSnOR 
EOOOAP E.O0Ua<»OMO00W0R)RDa.YCBWU»IYPi l PII0BPIW T lHBMP 
ECOOOHA S.COUOOHAOMOODiMPORNADrSPfiCinOdLUrAMATIOBIVDMO0Me. 
ECOOOIIWr B-COUOOIWOMPCOOSIOWW^OEPEMOMOUffOIATEOiMYORO OMBB . 
fiCOOUkC BjCOUOLOCANDOLaAOBCBOOOSiaPORAO^^LUOOSSSYNrHfTASiAtB 
BCOOLMS BjCCUOLNS OM COOSO ROR (■.tfTAIIYL-THMA SY W n gTASB. MSBP 
6C00LTA EjCOU aLTAOMBOHCOMOPMN AND SUCAteOOPCRONBSOOSM 
eCOOlTV fijOOU01TXOMBCOOSiaau;rAHVl.-T1*IASY*m«TAM, COMPUTE COS. 
eCOOLVA BjOOUOLVAOM OODOtO POR SCRMB MYDRO J CYI g TWLTRN M FER <i E. IMSSP 
eOOOLYS EO0U0LVB0MO0DS«aR0ROLYCYiaiSiASWrHVrASS4LPMA-ANO 
ecOOLVWA EjOOUPOSAM<DOLYWOP«SBCOCSIOP HOSWW TPI(i rH P CBi «PI«SPIWTB 
fiCOOND B JOUOPBOM COONO POR MN0SPM00UJO0MATE0iWY D ROSBI)IS S .IBTBP 
GOOOOR EjOOUOOROMBICOOi<OOUffAffHOBWgUC1AS«.CCMIMWCeS. 



fiOOOUMA EOOUOUMAOPEROttOUM APIDOUAAi 
eOOOUT L00UaUT0PMM;aur^0Un^4NOr 

E.C0UHAOaM 
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BOOUPPLTi 
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GOOttSOP fi.O0UHBT«S«CP6KNO0NTAMNO 

eeoMss s.cajMsaMr 

fiOOMSTI E.CCUK1t»flST0 
BOOM.V E.Oai |Mi| HLYC, »€YA. W.Y» AND »t.YO O 
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ECONSOSK CCOLISmAMKIlKSDSOMCOOMOFOAASPEOncrrVftUBUMTOf 
eCOHTPA S. COLIHTPntHCAT SHOCK n£GULATOnY>Oa«,C0UPLETt COS tOtOttP 
CCOMTPfW S.COU HTPA 08« COOMO FOR HCAT SHOCK REOULATOnV mOTEN I3I2BP 
ECO«.ER CCOU K U lER OM ENCOOVtO TV€ Kt REPRESSOR, AND ORFn B4C0DN0 
ECOLVA E£OU I.VA 0B« BCOOtM TNREONMC OEHVORATA&S. CCUf\CTS CCS, AND 
ECdVBPR C.COJ LVQ OM COOMO FOR ACCTOHVDROXY AGO IVNTHASE I, STCftP 
ECOLVE E.CajLVEaSCF0RBfWNCHE0XHAMMIN0ACOAM»IOTWINSfCRASG 
ECOI.VOE fi.COU LVOEOA OPCnON. LEADER t ATTBIUATOqt PEPTTDC GENE | LVL) ANO 
ECOLVOCB E COLtBLWCOAOPCnON LEADER PEFrDCOaCCOMR.CTE CCS. 30«P 
ECOLVm E£0ULVW0PER0NENC00MVM.»S4e«8mVSACfTOKVDA0<XYACO 
eCOLWC E.CajK1tl.WMDLVYOBIC8DCO0MaACCT0HYDnaxVM:DX«nBP 
ECOKMA fi.CCU*0HEStCNWn0a*8UBUNlTMMBO»CMiaP 
ECOKHAB E.C0UOQ«A(0FK««BCPERCN)BC0OMaAnbBnUM.MEPnaTCN. 
ECOKOPABC E COLI KOPNK OPERON COOMO FOA KDP-ATPASE PRCrTENS KDPA.4.-C , 
EC0K088 E.CtiU KOSS 0B« ENCOOMO CTF;CMP4-OEO)(Y44iM»MaOCTU.OeONATE' 
ECOKSOA E.COUK8OAGENECO0MQFOniKTHVLTnANSfERASE(WET)C0NFERRMO 
ECOLAC E.COU LACTOefi OKPOH WTM LACI. UCZ. LACV AND LACA OB«S. 7477BP 
ECOLEP E COU LEP OPCROM PWMOTEa LEPA JMO LCP OWES. COMPUTl COS. 
ECaEUA E.COU LEU OPERON. LEADER PEPTIOfi OBC mO LCUA OOC COOMO FOR 
ECOLEXA E,COULEXAG£HE COOMO FOHSOBFUCTKMREOULATORYPAarEN. MSP 
EC0LP2t E COLI M.PAOENE COOMO FOnLP0PfWrrEM2S.C0Mn.ETE COS. IWtSP 
ECOLPP E COUtPPffTRUCTUULOaC COOMO FOR OUTER UaeRMEUPOPnOTEH 
ECOLSP E.COU LSPOBS COOMO FOR PR0Li>0PAOrENSiaNM. PEPTIDASE. n36BP 
ECOLSPA E.CCULSPAGe«FORLIPOPncrTEMSiaNM.PEFTVMSE AND LES GENE 
ECavSC E.COU Kit LY8C O0S BCOOMO ASPAffTOKNASE IL COMPLETE COS. 
ECOUALB E.C0UUALBnEOKMPnCUOTEaMMJ(-lAWAN0UAL£fO0PER0fe:«M6aP 
ECOUALO S.COU WAL^ |9 OiOi AND MM.O OBIES COOMO POP UM.T06S TRANSPORT 
ECCUALT E£at MALT OBS BCOOMO THE MALT PRCFTEM, COMPLETE COO, 3M«P 
ECOUQA E Cai Mao OENE COOMO FOR UG.SI06C CARRtEA IftTSBP 
ECOMeTC E.CCU M6TC OM SWOONO OETA^YSTATHONASC. COMPLETE COS. IMOOP 
ECOHETO EJCOU MCTO Q&H COOMO FOR MCTHMNVL TRNA SYtmCTASfi. COHFIETE 
ECOMCTK CCaiMETKOB« COOMO FOR S-A0e«06nMETMCNMS SYNTHETASE. 
ECOMETV E£CU Mm. OaC COOMO FOR ASPAfTTOKNASEn-HCMOSERMEMaseP 
ECOMETLBI fijCOU MCTB iMDMen. (V 040) 0£NC8 COOMO FOR CV8TATH0NE t4nSP 
ECOMOTAS E.COU MOCHA PROMOTER, MCrTANAMCrTBQBCSM) START OF CHEACeiE 
ECOWTLA E.CCLtWTLAOB« COOMO FORMAmrrOL-SPECinCGNrrMIOFXiaBP 
ECONDH E.C0U»OH0B< COOMO FOR MA0H0CHVDR0aO4A8E.2M7BP 
ECOMRR E.COUMRnOB«£R£OULATM0FUM(UTEAN0NrTnATEflE0UCTKM.lMieP 
ECCNn. E. COU t9\. Oe« FOR H-ACCTfLNEURAMMATE LVASfi (EC 4.1 3.3) 1 U3BP 
ECONRDA E C0LIRB0NUaE0SOE0IPHQ6PMTE REDUCTASE OPEROMNROAANONROe 
ECONUSA E.COU NU8A OPERON MaUOMO OBCS FOR HEr TWA^ (WCTY), It KO 
EC0NU60 E.COU NUS8 OBti COONO FOR M N UTI.IZAT10N SUBSTANCE. tOBP 
ECOOMPA E CCU SUA AND OHPAOENES COOMO FOR ftULA PROTEN (LCN SUPPRESSOR) 
ECOOMPB EjCOLt OMPO OPERON: OMPR M) ENVZ OBtES COONO FOR PRCFTEMS HOaOP 
ECOOMPC E.CCUOMPCANOMlCFOBCSCOOMOFORMAJOROUrERMaBRANE PROTEN 
ECOOMPF EOOUOtU^OBC COOMO FOR MAJOR OUTER WaSAANEPROTEMCUPF. 
ECOORI E.C0UREPLCAT10N0RX)M(0fliOAN0A8NAO9<ECO0t«lF0n»n8P 
ECOORUSN EXOLI REPLICATION OROMfORIC) AND ASNA OENE COONO FOR 401 2BP 
ECOPlPARA E. COLI Pt PAR REOtCN {PLAEMID PARTmON) OENES PAAA-B. NC8 SUBP 
ECOPAfiA C.CGU PAAA G0« COOMO FOR P-AMMC3B P C0ATE 8YHTHCTAS6. H08P 
ECOPUB E.COU PMB 0O«, COOMO FOR P-MINOBBCOATE SYNT>CTASC, COMPLETE 
ECOPAPA EOOLI PAPA OM. COOMO FOR THE PAP PHJ SUBUNTT. 7«lBP 
ECOPBP* E.CaiPBPB0O«(FT81) COOMO FOR PB<ICUJN«MDMO PROTEN X 
ECOPFKSK E. Cai PRCB OBC COOMO FOR PHOOPHOFRUCTCKNASE-I. COMPLETE CDS. 
ECOPHEA EOOU PHC OPERON, PHEAOB« COOMO FOR CHORSUATC I MSP 
ECOPWAS fi.C0UP»«AN0TVROPER0NS;PHEAAN0PHEOMSCaO»NlFOR4fl0<eP 
ECOPHOSt E.COU PHE5.T OPERON ATTBWATOR REOKM FROM PUSMO P91 ft WTTH S4 
ECOPH06 EjCOU PHOC OENE COOMO FOR PHOSPHATE LUrTATKM MDUCSLE OUTER 
ECOPHOCA E.COUOe«SPR0e. PROA. AND PHOE COOMO FOR Q AMMAOLUTAMYL KNASE. 
ECOPHCU E.COU PHCM OPERON. CCNTAMNO PHOM OENE WO THREE UMOOOnEO 
ECOPHOB EOOUPHOePHATE SPECnCTRWBPORT REOKM PHOS STRUCTURAL OeC 
ECOPHOWTU E.COLIPSTAfPHOTlPSTB.PHOW AND PH0UQ»eSBC0ONO PHOSPHATE 
ECOPHRORF E COUPHROENE COOMO FOR OEOXVRSOPVRftADMS PHOTO. VASE. aONBP 
ECOPM E.COU PMOatE COONO FOR DNAMVERTASfi. COMPLETE SEOUBCE. IIOBP 
ECOPMP E CCU P NVERTSLE OJBtOa AND PN OM DCOOMO NVERTASE. Mt4aP 
ECOPLDAA E.COUPLDAOBC COOMO FOR DR-PHOSPHOLJPASE A. COMPLETE COS. 
ECOPLM E. COLI PLOB OENE FOR tMSRMOBRANE LVSOPHOSPHaVAftE U. UTWP 
CCOPLSB E COLI asa AND OOKOBtCS COOMO FOR SNOLVCERa.'»PH0aPHATE3M68P 
ECOPOLA E.C0LIKI2P0LAG84EC0ONaFORDNAP0LVWERA8EL41ZTBP 
ECCPONA ECOLI PONAOBC etCOONO PEMCtlMBNOMO PflOTEM lAfPBP lA). 
ECOPOW E COLr PO*C GM SiCOOMO PetCUN«NDMO PROTEN tB (PBP tB). 
ECOPOra E COU P0» OENE DCOOMO PYRUVATE OXIDASE. COMPLETE CDS. tt748P 
ECOPROC E COU PROC OEM COONO FOR PYRROL NE CAHBOnLATE RCDUCTASE. MttP 
ECOPRS E£OU PRS OM B«COOMO PHOSPHORBOSYLPVROPHOSPHATE SVNTXTASE. 
ECOfTSO E.C0UPT8OaM»C00MOOLUCOSf-SPECnC»ZVMEn0Pt8ZBP 
ECOPTSH E COU PTSH OENE COONO for MSnDNE-COMTAtfMO PROTEN (WfQ. 
ECOPURF E COLI PURF OPERON: OGNE COOMO FOR PROIEN 17.* OF IMOtOWNnNBP 
ECOPVnBI EO0UPYB8I OPERON COOWQ FOR ASPARTATE TRWaCABBMiOVLASElSMBP 
ECOPYRBIA B.COU PYHBI OPERON WCOOMO ASPARTATE TRA WS C AHBAMOVLASB I ATCASE). 
ECOrVTSS fi.COUPVnBI OPERON PROMOTER-REOUUTORV REGION MaUDMO LEADER 
ECOPVnO fiXOUPVROOetfi COONO FOR OWVDROOROTATEOEHVOROOENASE. COMPLETE 
ECOPVREA fi.CCU OUT 08« ENCOOMO OUTPASE. COMPLETE COB >V«PVRS OENE I WWP 
ECOPvno E.C0UPVA0(C0MPLCTCC0tt)M«0B«)<PARnM.C06}WHASOCO0Na 
ECORBS E.CCUKI2RBSD.RBaA.R8SC.r«8tANDnBSK0eCSENCODNaTHEM0H 
ECORBSP EOOURBSPOM COOMO FOR DRBOBE-BMDMO PROTEN. MlBP 
ECORECA EXOU RECA OBtS COONO FOR RECA PROTEN. 13M6P 
ECORECF ESCHERRHUCOLtRCCf OM1MTHarB4D0P0NN4aB4EANOS-B«OF 
ECORECFA fi.C0URECF0OC.3r»0OF0NANOBCAN0rB«>0POYRB0B«, 
ECORBJ ECOLI Ras GENE. COUaETE CDS. >143BP 

ECORFIX E COLI RF-I OMBCOONO PEPTIDE CHAM RaEASE FACTOR 1.CCMR.rrE 

ECORF2X ECaiRF-SOMDCOONOPEPTCIS CHAM RaSASfi FACTOR »,COWaETfi 

ECORHO E.COU RHOOBC COOMO FOR TRANBCflmONTERMMATKM FACTOR. IMOBP 

ECORHOA EOOLI Kit TRXA 0B« BCOOMO TMOREOOXH COMPLETE COS. M7BP 

ECORHOB EjC0LITRXA0B4E COONO FOR THOREOOXM (COMPLETE COS) AND RHO 

ECOnC ECOU RNC OENE ENCOOMO RSONUCLEASEW.tCTBP 

ECORNCI E.COUKItfSTnANSSaMOMRNCFORRBONUaEASEn. lOTttP 

ECORNH E.COU RNHOOC COOMO FOR RttONUaEASEKTITBP 

GCOfMO E.CCU ONAO (MUTD) O0« COOMO FOR ONA PGLVWERASfi tl EPSLCN 

ECORNPA E COU RNPA MO RPMHOBCS COONO FOR THE PROTEN COMPONENT OF 

ECORPA ECai ALPHA R808CUN. PROTEN OPERON 3lft4BP 

ECOflPLN EjCOLIPARTWLSIO OPERON; COMPLETE 6PC OPERON ENCOOMO RS06OMAL 
ECORARPO B COLI RPU.RnARPLJ.flPL,RPOB AND RPOC OMS COONO FOR SEVERAL 
ECORPMBO EJOURPMB AND RWW PENES COOMO FOR nOOaOMAL PROTEMS Ltt * L33. 
ECORPOA EjCOU M.PHAOPCR0N.RPOA,Rn.O. AND RPSOOBtES COONO FOR RNA 
ECORPSA fijCOUnSPAOENE COOMO FOR RBOeOMM. PROTEN 81. NISP 
ECORP88TS EjCOURPSB we TSFGPCS COOMO FOR n J O OSOMM. PROTEN St AND 
ECORPSt E COU RPSI AND RPLMOeCSBWOOMORBOOOMALPROTENSS* AND 
ECOflPSJ E.COU RPSJ AW RPLCO0CS COOMO FOR RBOBOMALPROTSPMStOAU 
EC0RP6O EO0URPeOOB«FORR»080MALPROnNSiBANDP*»OENS(ft-BIDl 
ECORPSOP E COU RPSOANDPI^aBCS ENCOOMO RSOSOiML PROTEN 8 1» AND JOCHBP 
EC0RP8OX 8.C0URPS0 OENE COOMO FOR RnO0OUALPROTEM8t», COMPLETE COO. 
ECORPSRPO fi.COU RPSUtMAO RPOO OPERON WITH Oe«S COOMO FOR RttOeOMALSOHBP 
ECORPST E COLI RP8T OM COOMO FOR RMBOMAL PROTEN S». MlBP 
ECORPSTA ECOU RPSTOM ENCOOMO RB080UN. PROTEN StO.XOM COOMO FOR 
EC06BCB E COLI S8CB0M ENCOOMO EXONUaEASSt, COMPLETE COB. tttTBP 
EC06ERA BOOU SERA OM BCOOMO D^PHOOPMOOLVC&IATE OfiHVDROOENASC. 
ECOSERB EiXLI SERB OM MOOMO PHOBPMOBERME PHOSPHATASE (PSP) I II iBP 
ECOSOK EC0U44SnNA0M.7*4BP 

CCOSPPA EOOUSPPAOMetCOONO PROTEASE IV. COMFLETECDS.Z2UBP 
EC068B ECOLI SSB 0B4E COONO FOR SN0LE^8TTUNOEO0NA«MONO PROTEN 
ECOSSR EC0LI8SROMMOONOt8RNAM4BP 



EcosTRi E.CCU rm operon wtth rpsl tno rpso oms coomo for rbosomm. 

EC0STR3 E.COLI 8TH OPERON WTTH FUSA ANO TUFA OMS COOMO FOR aONOATION 
ECOTAO E.COU TAOOM COONO FOR l-MErHVLADOmE-ONAOLVCOBYUkSE I, M«P 
ECOTARTAP fi.Cai TAR ANOTAP OMS COOMO FOR Se«ORV TRANSDUCER PROTEINS 
CCOTGPAO E. COLI GM FOR TRNAtD-PRO.IOSSBP 
ECOT08 E. COU TTMA^EA^t WO 23JKD PROTEN OENES. IM48P 
ECOTOTUn ECQUTUFBGStCCOONOFOnaONaATKNFACTORTUI POURTRNAS. 
ECOTGVI ECOU TVRT LOCUS CONTAMMO TWOTmTfMA-l OMS. 1W«P 
ECOTHR E.COLI TMCONNE OPERON VMTH THR/L T>«« ANO THRC OBCS COONO FOR 
ECOTHRNF E.COU TIAS. NFC. RPLT. P»CS. P»CT AM) HMA OMS ENCOONO TTMBP 
ECOTHYA E COU THV A OM COONO FOR THVMDVLATE SYNTHASE. I I6JBP 
ECOTNAA E.COU TNAAOM COONO FOR TRVFTOPH«NA8EAM>rRjlNK.»«3BP 
ECOTOLC E.COU TOtC OM OCOONO OLffER M CMOR AN E PROTEN TOLC, COM^En 
ECOTONB E COU TONB AND PM OMS. COMPLETE CDS. I H7BP 
ECOTOXA E.COU TOXA OENE BCOOMO WOUm A Of- HEAT4ABLE BfTEROTOXM 
ECOTPU E cat TP1A CM ENCOOMO TRKSEPHOSPHATE BOMERASfi. COMPLETE CDS 
ECOTRO E.COLI TROOM COOMO FOR TROCHQMFTAXa PROTEN. COMPLETE CDS 
ECOTRUD E Cai TTWO OPERON BCOOMO TfMA(M|.0)METHVlTnANSFERASE. 4ftMfiP 
ECOTRP E COLI TRYPTOPHAN OPERON: BfTHE ONA SEOUBCE. 73K6P 
ECOTRPR E.COLITRPfl OENE COOMO FOR THE TRP OPERON REPRESSOR PROTEN 
ECOTRPS E COU TRP8 OM COONO FOR TRYPTOPHANVL TTMA SYNTHETASE. lOoeOP 
ECOTRXA E COU TRXAOM BCOOMO TMOREOOXHCOMaSTE COS AND RHO CM. 
ECOTSR E COLI TBR OM COONO FOR METHVL JCCfiPTMO C»«tOTAXia PROTEN L 
ECOTYfB E.COUTYRBOMFORAR0UAT1CAMMOTRMSFERA8E.1U7BP 
ECOTVRR E.COLIKIt TVRR REOUUTORY OM BCOONO TYRR PROTEN, COMPLETE 
ECOTYR8 E COU TYRSOMCOOPMl FOR TYROSYL-TRNA SYNTHETASE. 1 27SBP 
ECOUMUCD ErOUUMU OPERON: UMUO AND UMUCOBCS BCOOMO RECA ANO LEXA 
ECOIMC E COU ATP (UNO OPERON CONTANNO WE GBCS COONO FOR ATP 7M0OP 
ECOUNCA E COU ATP (UNO OPERON ENCOOMO PARTIAL ATP4YNTHA8E COMPLEX: 
EC0LWC8 E COLI LNC8 0M ENCOOMO THE A SUBUNTT OF F-lF-«ATPAS£,tt«BP 
eCOUNCfi fi COLI IMCE OM FRACMBn^ BCOOMO F-l F-O-ATPASfi C-SUBUMT FOR 
ECOUNCF E COU IMCFOM BCOOMO THE B SUBUNTT OF THE 4«3BP 
ECOUVRCA E COLI UVRCOM COONO FOR T>CUV REPAIR PROTEN UVRCNOOBP 
ECOUVRO E COU UvnOOM FOR ONA naCASE 11 COMPLETE COS. tMSBP 
EGOXSEA E COLI KSEAGBtE COONO FOR T>C LARGE SUBUNTT OF EXONUCLEASEVU 
ECOXVLAB E.COU XYLA AND XYLB OMS FOR O-KYLULOKMASE (PARTIAL) AND ItlWP 
ECOXYLAftA E.C0LIXYLAGMC0O*CF0flXYLO6EIS0MERA8E WOXYLBOMCOONO 
B4TDFU PLASMrt)aO0Fl3.OBC8K.L.C0WLETEC0S,M>aOACMGM. 
BITDFISB PLASUIDCLO0FI3 (FROM fi CLOACAE) W.33% OF MAP. BACTERIOCNOPCRON 
BfTVPARB PLASMIOCLODFl3,LOENEAtOPAPDREOM3N.««aBP 

ENTOMPA BOEROBACTER AEROOMS OMPA OM COOMO FOR OUTER UBSRAHE PROTE N 
ETALVOEO E.TAROAIVOEOA OPERON LEADER PEPnDEGD4E. COMPLETE COS. 30«aP 
FDtOVPA CYWOBACTERIUM ( FOVLOBIPHO^Q OVPA OM BCOOMO OAS VESCLE 71 OOP 
FOrSBA CVANOBACTEnUHfFJMPLOSrHON) TRUn* HERBCOE4N0N0 PROTEN 
FPUtM F PLASHID 42.M-U.* F SEOMBfT. TMBP 

FTUCO F ruSUO ONA COMPLETE MM-F RE0K3N (F COOROttATES 40 JOlFTO 
FPLORMC FPLASMPDREPLCATKNOnON AND MCOMPATVtJTV REGION (NCC). 
FPLTRAM F PLASMIO lE.COLt TRANSFER OPERON, TRAM. FNP. TRM. TRAY. TRAA, 
HALBO HALOGACTCnUM HALOBRJM BACTERKKIPSM (ttO| OM. IttlBP 
HALB06F HALOBACTERMMHALOBMMBRPOM NV0LVEDHBOPOBIEEXPRE8SKN 
HEWrrs H HABylOLVTICUB ONA WCTHTLA8E OM. COMPLETE COB 147ttP 
NSIWECO NSERTXMaeMBfTBtM FROM E.COU RRl.miBP 
NSIECLAC NSERTIONaaefro I, FROM E COU LACI 0M,M4BP 
NBinOSD NSERTWN aEIWfT 80-Bt (NUXI). FROM S. DYSBfTERUE. WnP 
NS1S0 MBERTION aaiBIT 181, FROM S.0Y8EMTERUE.U3BP 
NOtMPVU NSERn0NS£OUBCEIS2tRFR0MP.VULOARaFUM(NaTHJ«M,MtBP 
MMECO NSERTKNaEWBTTISaO, COMPLETE. FROM fi.Cai Kit. 1271 BP 
NSaUM NSERTION aB^BfT IBS. FROM UMBOA KH10Q. 1 1HBP 

J01CO PLAGMO PJ01 FROM NEaSERWOONORR»CAE ONA, COMPETE OEN0HE 420TDP 
KAELVOED K.AERO0B4E8I.WE0A OPERON LEADER PEPTDEOM. COMPLETE CDS. 30TBP 
KAEPABA K AEROOMS PABA 0B4fi BCOONO PARA-MNCBBaOATE SYNTHASE U4BP 
KAERBT n.EBSmAAER00BIE6R«rT0L(R8T) OPERON CONTROL REOKM. 13*1 BP 
KAETRPA KLSSeiAAEROOBCS TRYPTOPHAN OPERON TRPA0B4fi.wgeP 
KPNMSO K.MUMONUE M8T1DNE CCMTTKIL REOtON (HBO *N0 V MSG OBfE. 
KPNLAC K.PNEUMONUE UC OPERON AOlSP 
KPWiFA KLeBSiaUPNEUH0MAEMFAOM,197tBP 
KPfMFW K PNEUMCMAE MF H WO PARTIAL NT 0 OMS. »-RJM(. tMTBP 
KPIMTTU KLaseiAPNEUMONUENTRAGMFORNrTROGBtREOULATKM. l«3«&P 
KPMmoO KLEBSCLLAPNELMOMAENTRBOB^EFORNrTROOBIREOULATKN 140«P 
KPWrRC KLEBSeiAPNEUMOMAENTRCGM.ISTnP 

LB3H0C LACTOBACU.UB 30A HBTIONE DECAnBOXYLASE GEM iM> MUTATION tlMOP 
LCADHFR LACTOBACKIUS CASEI DIHYDROFOLATE REDUCTASE GM (DHFR), COMPLETE 
ICFFDHM HETHNCBACTERILMFOfWlCKUNFDHAANOFD*«OM8BCOONOTHE36»7BP 
httMSMI WETHANOBREVBACTER 8Mm« ONA WITH MSERTKM aOIBfT I8M1. aoOOSP 
MBHORFW METHWOBREVUVTERSMfTMl ONA WITH TWO UMOBfTFIEDREAOMOlTtOBP 
HVONrn HETHmOCOCCUSVaTAfiOM HOMOLOGOUS TO MFHI472SP 
MXAS WVXOCOCCUSXWTHUSOBCS two 2 FOR PROTEN S.3MaSP 
HYCRPS* HVCOPLASMACAPRCOLUMRPSHfVlFANORPLROMSCOOMOFORItMBP 
N00PL1 NlOONORRHEAE STRAM USI t PLUS OM (PLBl), COMPLETE. t73BP 
NRIUER PLA8MO NRt MERCURY RESBTANCE (ICR) OPERON. 37UBP 
PAB4ERD P6EUO0M0NA8 AERUOMOBA PLASMIO RtOO UERCURK RESBTiMCE OPERON 
PAEPAER7 P>EflU0M0BAnA8MK)PM070BOF0nB«0NUaEASEANDtCTHYLASfi 
PAMML8R aASMIO PAH77 lU RESBTANCE OCTERMNANr SCOUBCfi, COOMO FOR 
PBOK PLASMS PUBl 10 (FROM S. AUREUS) KANAMYCM NUCLEOnOYLTRAWFERASE 
PB2CAT fLASMK) PUB lit (FROM S AUREUS) CtCOnAfcnPaCCLNH BP 
PBF4£nMF RPLA8M0PBF4 (FROM ».FnAOLIS)EfWFGM ENCOOMO UACROUOfi- 
PCI PLAGMD PC1H. COMPLETE GENOME. ttlOBP 

PCICAT PLA&MDPCIH (8 AUREUS) CI«.ORMIPHe«C0LACCTM.Tnw8FERA8fi (CAT) 
PC2Ca PLASMD PC221 (8 AUREUS) COMPLETE OBCME. 4H«P 
POUUER PLASMO P0Ut3M (FROM SJMARCESCM) MERCURIAL RESISTANCE (MER) 
PEt PLA8MIOPfitW.COMPLETfiOBIOME.273nP 

PCiMER PLASMIO P(2S« (FROM SJUJREU8) MERCURY REStSTMCE (MER) OPERCN 
PP404SCN PLASMO PIMM {FROM CLOSTRJ0IUMPERFRM0B«)BCNOM BCOONO 
PJHAPH PLA8UD PJH1 (8 FAECM.B) WNOOLVCOSOE PHOSPHOTRArSFERASS TYPE 
PJR»*H njOWO PiRZtt l»H OB4E COONO FOR KYOROMVCN B PHOSPHOTRANSFERASE 
PLBECORV PLASMS Ul (E COLI) ECORV OBCS FOR BO0NUCUA8E A METHYLA8E. 
PMOHOC MOROANELLA MOROMM HBTIOME OECARBOXYUSE OM. 12HBP 
PMORLPP M MOROAMI PROLIPOPROTEN OM ANDft' RAMC, 632BP 
PNECO PLASMS Mill (FROM 8 EPOERMKXS). COMPLETE OBCME. 336«BP 
P09RSA PLASMIO POA02 FROM RAVOBACTERMM SP. KITl NYLB OM (RSSA) I467BP 
POORSa PLASMIO POAD2 FROM RAVOBACTERWM SP. K1 72 NYLBr OM (RSIB) 
PR1TRA PLASMORI (RESISTANCE PLA8MIOTRAM.TRAJPRaTEN8. COMPLETE COS, 
PR2VP1 E COLI F4JKEPt>SMN>Rl-tt(SUB0R0UP ID TRAA ENCOOMO 42IBP 
PHECOR PLASMS Rt13 FROM E.COLI. ECORI BCONUCLEASE ANO METHVLA3E OMS 
PROIRM P.STVARn P8TI RESTRCT10N JMO MOOnCATION OENES. COMPETE. 
PSCIOtORt PLASMIO PSCtOlORKlM OF REPLCATKM-ttOlSP 
PSCREPlOl PLASMS PSC101REPI0IOB4E. COMPLETE COS, tItTBP 
PSCCPOt PSEU00M0NA8SP. (STRAM RS-U)CARDOXYPEPTSASE 02 (CP02)GM. 
P8EETA PAERUGMOeAEXOTOXN A STRUCTURAL OM (ETA), COMPLETE SEOUENCE 
PSEIANMN PSEUD0M0NA88YRMaAfilAAMAM>MAHGBCSBCOO»«OTRVPTOPHAN347caP 
P8ENAZ PSEUOOMONASSYRWOAiNAZOM ENCOOMO CfiNUafiATCN PROTEN 
PSa IP PSEUOOMONAS FRAOl LIPASE OM. COMPLETE COO. BD4BP 
PSB4PC TOL PLASMIO (FROM P PUTtOA) XYLE OM BCOOMO METAPVR0CATECMA5E. 
P&EPK.PAX PSEUOOMONAS AEROOMOeA(PAK) PLMOnE, COMPLETE COS. I22«P 
PSEPK^AO PSEU00M0NASAERU0NOSA(PAO| PLN OM. COMPLETE COB. 123aBP 
PSfiPLC PASnUOMOBA PHOSPHOLtPASfi C (HCAT LABU HBAOLYSN) OM. MMBP 
P8MN002 SYWMEOAPIASMS (FROM RAe.R.OTDNO0ULAT1ONOB«ESN0OA,M00B AND 
PSN PLASUD P8N2. COMPLETE OBCME. I2B«P 
PTi PLA8UIDPTl«t.CCMPLETEO0«OME.4431«P 

PTBK PLASMS PTB9I3 (FROM A THERM0PM.CBACL1US) KANAMYCM IIDOOP 
P1MPAAC31V R PLASMA) PWPTB (FROM SALMONaLA8PTAMNOOLYCOBSfi^3)t3T(«P 
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PWPAMFH PAAASPONU RNCOBtUM NmraOSMSC OBC, SOOOSP 

moOTfUM PUSUOniOOTfUUMmPaCNESDCOO»«anUNSFERANOFCRn.rTY 
niOJ f)UtSMiORm(«34«4M«4,RCn.lCATKM.»C<WPATm.rrVMOCOPV 

nio4 n.ASM0RtM(BCQtMNaATu.»Mi4iBinNscRrncNaBefr. 

R10PL E.COLI P4.KE PLWO RtOO-l (SUBORCXIP IVt TRAA OBC. 43SBP 
nacPL E.COLI P-LKE PLASMO RIM (SUBGROUP IA| TRUOBC.^AlBP 
RMDHFR PUttMIO RM TRMCTWPnMRESfiTWT UNVDAOra^TC RCDUCTA8S (OHFR) 
n4«OXA2LB PLASMID »<• (FROM 8 TYPH*IORlUM) OXA 2 BETA-LACTAfcUSC QO«. 1060eP 
RUAAfiA aPUkSMIO Rfc3»-« AAOA {STRCn-OMYCmPtCT^IOWYCW »*40W> 
RMCOPA PUSU»RMRCPIXATKMCC»mKXnaNCOPAOB«C0OM0R)HRNAIA. 
RUREPCm PUteu»RMRCn.CATK3N(X3NTnOLSYrrBMHUaHCNT;COn.C(yArMCA 
RtTDHFR PU^X>mrTYPCnO»fYOR0f<XATGREDUCTA&E(0W«aM. COMPLETE. 
RTTARO PU8tM)RmARSB«CALRCSaTANCEO(>CR0NOBCS.AnSA.AnBaM> 

RAim PtAS«CRAtTETR«Va.H6R£»«TAWCC REPRESSOR (TETH AND 7MBP 

RCACYCA »«O0PS£UO0M0NASCAP8U>TACVCAOmEBCO0tiaCVTOCWWM6l»*aP A00C8PE Hl*l« AD04CWM SUBOftOUP E (TYPE 4» OHM»€)»lO PTOTEW. COMPLETE 



VCKTOX VORn CHOLERAE TOXA AND TO)S O0ta FOR CMOLERA DtfTEROTOXM 1 1 ISP 
veSOOPL PlEOONATHaACTfRIOCUPRENftUPEnOiaOCOeUUrASiaBtf. COMPLETE 
VSTDH veniO PARAHAeWXYTKUB TOH QBtf ENCOOMO TXRMOSTAOLE OMCCT 

'VmUBES 

iC2E2ICG R£C0MBMAFfrAVMNRETROvnU8MH]E}1.COMPLETEOmCME.M30eP 
ACBERBSH AVUN ERYTIAOSLASTOM vnuS{AEV-»4 V-EI»«CNC00e«. 19f1BP 
ACf RUtUMieARCCMA VIRUS (lMNTEaRATE0OnCULAf^.C0Mn.ETEaa40ME, 
ACM AVtMWVaOCYTOMATOGn VIRUS MC»(PnOVinM.). COMPLETE PM 17790? 
ACRPOL»V RET1CUO»DOTHaiOSaVtflUBSTTUMA,PnOVtRM.,POL.»VAN03'LTR 

iCRAa RcncaoMxmtanets viRue smAt* t, proviral. oncoom tv-flai. 

ACV AVIW SARCOMA vnuSVTS. COMPLETE OENOMS.37IMP 
AOaca ADeMMRUS TVPC I. CCMPLETE OBIOMS. 3M970P 
AOCf* HUM«MADeiOMRUSTVPC3neCRaENi.COMPLETVCOS.AMOPROT¥MS 



RCALM RHOOOPSEUOOMONAS CAPSULATA LIOHT-HAAVESTNO I (LM) STRUCTUfUL 
RCARCl RH00OP3EU00M0*tASCAPSULATAPHCT0eVNTMETCOe«aUSTEflRXCaLOCt. 



ADEA2 



ADetOMRUS TYPE «: LEFT a» OF T>« OENOME tCOORDffMTCS 0%TO33J«% 
AOeWMRUS TYPE B: HEXON. 23K, DMA BtONQ PRGrTEM. lOOK OOCS 



nCARCZ RHOOOPSEUOOMONASCAPSULATAPHOTOeVTmiCnCOBCaUSTERRXCALOCI. A0CE3CP ADMMRUS TYPE k. REOO* G9 t* KO OLVCOPnOTEM MfMA. COMR.ETE COS 

RHMPDK PARA8P0(«ARHOOeMyiNIR>»«NIW OWES COOtW PGR THE ALPHA. *«0 

RMmrOKO nMI20SJUMJAPONCUHWR9*(OPCRaNCOOMaPOR[)Mm»OOM8S«.PHA 
RHiNFH RHeOBIlMMPOMCUIMFHae«BCOOtUNrmOOBtASfiFE.ta(IQP 
RH.nXZ RHOOetlM lEOUMilO O * n UM nXZ OENC for MTROOOI fixation. COMPLETE 
RM.CMOO RHBOWM LCOUUMOttAAUM NQOOATKN OB«S NOOA, NOO* MO NOOC. 
RHXOOOFE R.LE(UJM««AAlMR>SMOPn.tJ1H000FERE0KMF0RN00ULATKM. 
mOIOOU fLLEOUIM08AmMN00ULATKM0BCSN00lilM}N0OJ.C0Mn.ETECDe. 
RMtHSNAO RMaa.OnMCaAPtMMK>PnMG4tB8YMHBNaB«S^e,C.AN004M«AP 
RMMIFA RMZOBIUMHaxOmNIFAOmE. ttnBP 

RWMOO SVM HEOAPLASMlO(FnOM AUaLOH 1021) N0OO.NOOA.N00a MO NOOC 
RK2K0AA PLASMOnUKOfUOM. COMPLETE CO0M0SEOUeCf.il OOP 
RKTTRFA PLASMID RK2 TRFA OetE COOMO POR PZW MO PW. MO Pill OENE. 
RPITET PlASMDnPlTCTIVkCVCLM6R£8tSTMCE(TET>DCTEmi»wnB:REOUUT0flY 
RRUATP RH0OO6P«U.UM RUBRUU ATP OPEflON MCOOMO Fl 4TP 8WTHASE BUBUNTTS 
RRUHOC RHOOOBPIR&IUU RUBRUM iUO HaOCHROME OBC. COMPLETE CDB. 4neP 
RRURUBPL RHOD06PtRLLUMRUSRUMRaULOeED«PH06PHATICAnDOXVLASfi.l6MeP 
RSSCVC2 RSPHAER0OC8 CYCA Oe« BCOOMO CVTOOAOME C4. COMACTE COS. 
RS5FBC RMOOOPSEUOOMONAS SPHAER0IDE8 nC OPCRCN (nCF, FBC8. FBCC). MMBP 
RSSRCM RSPMACR0I0C8 REACTION CmTER PROTEM U SUBUNfT O0C MO V R>N(. 
RTSREPA PLASMD RT8I FRAQMSfT (COMPLETE MM-RTSlI COOtIO FOR RCPA PROFTEN. 
RVWRCH RMOOOPSEUOOMONAS VnOePHOrOBYNTtCTCRCACTKM CENTER niBP 
SAUV1SPA 8TAPflVL0C0CCUSARCUBUt;TANr8TnANV1SPA0OCF0RPRQrTEMA.MMBP 



A00»C6P Aoewvnus type t ss^ona-bmomo PROTEN. IMTBP 

AOOL ADB«]VIRUaTVPE7lEFrOOOFTHEOENOUC:IMPCOORD»MTE8 0.MTO 
AOU ADm0VIRUBTYP6l2LEFra*00FT>COB«0ME.MA^CO0RO»MTESO.0TO 
AOTA TREE SHREW (T\JPMA) AOOOVnUB EAAV R£OnN(EtB) ENCOOMO SMAU 
AEBNCX BCMNE ADD«OMRUS TYPE 3 (BAV4) HEXON OBC. 2MMP 
AESLS AOENOVnUS TYPE 7 (SSMQ LEPT BtO Of OeHOUi: afr4.t lUP UMTS. 
N(V my MURMi LEUKEUU VIRUS. COMH^ETS PMMRAL OB«OME. U748P 
ALMCOlZ ALFALFA MOSAJCVnuSiSTRAM 421 LEOe^RNAt OF COMn.ETEOB4CMC. 
ALMCOZZ JU'AL/AM06ACVnuS(STnAM43SLEI)B«RNA2 0FCOMR.ETEOeCME. 
MMCOU AJAlfAMOBAC VIRUS {STOAM 4a MAOISOPOHNA] OF CCMPLETE 0B40ME. 
ALMPAOOet AVIM MVBjOSLASTOOtS VIRUS TnANSFOf«MMOOe«*arM>.tt2aBP 
M.RCO ROUSSARCCMAVnuStPAAOUSSTRAM. SUBGROUP C) COMPLETE CD«OME. 
M.R02 ROUS SARCOMA VftUS (SCMWOT-RUPPM A) BW-SRC^TR. »S«BP 
«.RDf«lt44 noue SARCOMA VRU9 (RECOVERED TO MUTANT 1441) S«V-6AC^VTR REGION 
APWtKL FOOT AND MOUSE OtSCASE VIRUS (FMOV.STRAMO-l-IOLSEOMBfTnNA 
APHRNA FOOT AND MOUTM DtSEASfi VIRUB, COMPLETE POLVPflOTEM COOMO SEOUOCE. 
AUCNPVP AUrOOAAPHACALtFORMCAMNPV(ACIMPV)PCLYHSORMOOC. I149BP 
BBVIO BLACK BEETLE VIRUS (BBV) RNA1 OF COMPLETE OBIOME, )t06BP 
BBV20 BLACK BEETLE VAUS (BBV) RNA2 OF COMPLETE G04OME. 1M«P 
BLCNCNP U CROSSE VIRUS OENOMSUOLCCIJLCSCONA TO OENOMK ANA. Ml BP 
BLCSmA UCnoeSfi VIRUS smA.C0Mn.ETE,COOMaR}RT»CNAN0 res M4BP 
BLV K»MSLEUKEUIAVnuS(PROVIRAL), COMPLETE 0e*0Mfi.t7l4BP 



SHOOMPA SMOaiADVBBfTEflUE OB* OMPABCOOMOM OUTER MBSRANEPROTEM. BLV»V BOVVSLEUKQtUA VRUS (PROVIUU DIVOENE ANO POBT EMV REOiON 



BTVU BLUETONGUE VIRUS UOBCCOMR.ETE COS. I773BP 
BUMH BUPfrAUWEAAVmuS IMA SGOUOfTM. COMPLETE. COOMO FOR A POLVPAOTEN 
CPMCOB COWPCAMOSACvnua BOTTOM COMPONeiTfMA(B MA) OF COMPLETE SMS6P 
CPMCOM COWPEA MOSAIC vnuSMIO0L8COMPONB<r(MRNA) OF COMPLETE OOlOMfi. 



SHFCRP SMGBXA REACRI IB CRP GBS BCOOMO CATABCUTE OOC ACTIVATOR 
SMACMA «RRATWMAWCESCENSQa«CHUF0RCHTt«ASEA.OBBP 
SUAtVOE SJ*WC£8Ce«I.VOE0*0PER0NLEAOERPEPTI0EOe«.C0*PLETiCOS. 
SUACMPA SMARCESCBISOMPAOeiECOONaFOflUAJOROUrERUMAMEPnOTEM 

SMAPABA SEmATUMARCESCOOPABAGENEBXCOOMOPARA^AMMOBeCOATESVHTHASE EUENV EOUMS MFECT10UB ANBiAVtRUB PRCMfUL Pa 9 040) MO MVOOCS. 

8MARCLPP1 SilAACESCae OUTER MBiimANCUP0PROrEMaO«,rnjM(« CDS. t74BP EMOP fiOUMCM«T1OU8Ma«AVnuS(EMV)PnOVIM.DNA.aAaAN0P0L 

8MATRPG S.UARCESCe«TnPOPER0H:TRf>OaB«M0RJW«(S.7448P EMCPP BCEPHALOMYOCAflOms vnuS(EMO| PWAPOLYPROTEMaeC.TIMSP 

STAOnB » AUnCUeO#TEROTOXNBOO«I.C0MPLCTEC0S.I7iaP PC8FOR OARONER^ASHEiOFaMC SARCOMA VIRUS (Pn0WRAMP70OAO*ORI«68P 

STAOEM SJUJREUSOEHOEfCBCOOMOLPASE (GLYCEROL ESTER HVOROLASE). fCSOAONC FaMB SARCOMA VIRUS (OARDNERARNSTENITWNSFOflMNO OWE. »»«P 

8TALPWH S HAaiaVTICUBLMAOB«0COOMGUNCOBAMOfi RESISTANCE. IM»P PC88M0NC FELM SARCOMA VIRUS (MCOONOUM BTRAt« TRAMFORMMG 0D4E. COOMG 

STAtr OTAP»«L0COCCUeHVCU6Lr0BWCO0M0P0RUPAB€.t21»P FCSVPOR Q*RONERJ«S*<e)FaME SARCOMA VWUS(GR«»V)V*ORONCOOO« 

STASPA SAUfCUS SPA OBCCOOMO FOR PnOTEM A, COMPUTE CSO.IttlBP PCVBM OARONER-AfMBTEM FB.V SUBTYPE B (OAmv B). PAOWUL, «vaOPE 

STWaOH STWEPTCMYCCS PLCATUS P«0OBETA«^ACET>L0LU C 0 0 AMMOA BE H OPg. FCVOftC FRM LEUKOMA VIRUB (SARMA) PROWUL SIV (OPTO) GDC AND VLTR 

STMbtE STREPT0MVe£SERyT>«AEUBenMEIOD*EeCOCMGflflNAI**^»MN0iMCi6P FCVOAOIV FRME LEUKEfcW VWUS Fav*. GA STHAM. WV GO* T PLAMC * LTR^ 

STWTTW STREPTOMVCESGUUCESCOttTYROBMASE OB*. COMPLETE COB. MIBP PCVaB« FR V SUBTYPE A CLASOOWfr I PROVMALENV OBI E. COWL ETE COS AND T 

STRASOM SMUTANSASOOBCENCOOPia ASPARTATE BETA«£MWLOEHYDfil400eP PCVCP FEUNE LEUKEMM VKUS (Fav«) GAG^KX. JUNCTKM. GAG 08* XSMBP 



8TRATPK S.FAECM.e K* ATFASE OB*. COMPLETE CDS. M43BP FWNA 

STRHYG STREPTOMYCES HTQROSCOPCUS HVO OB* 14MBP FUHA 

STRLVTFN 8 PNCUMONME LTT GENE BtCOOMO AUTOLVSN. t2l3SP RJHS 

STRM* STREPTOCOCCUS PVOOC»CSaMiaENSCO0Ma FOR MtPAOTEM. til IBP RMS 

STRMWMXP SPNEUMOMAEMNJCMOMALMOBCSBCOOMGUGMBIUNEPncrTENANO RMA 

STRSKC STREPT0COCCUBEOUaMLB(HUA}aTREPTaKNA8fi OB*. COMPETE COS. RBM 

8TRSPOIOP STRSPrOeOCCU8SP.(LMCEF«.0 GROUP 0)SPO OB* ENCOOMO MtMOBP RSNA 

STRSTRAVO STREPTOMVCESAVIDMIGB* FOABTREFTAVOMUaSP RLMS 

STYALR S.TYPNMURIUM N.R 0B4E ENCOOMO ALANME RACBIASE. COMPLETE COS, FUMS 

STYARABAO S.TYPHMURIUM AAMADOPERON: ARAB, ARAA. AND ARAOOBCSCOOMG FOR flPHA 

8TYARALC S.TYPHMURIUM I ARABNOSE OPEROPfflEOULATOfff REGION MO C OB*. RVHA 

STVAROMPM 8.TYPMBIURUIAR0A LOCUS MMOLPVRUVVLSHKtMTl^HOSPHAntnSP FMIHA 

STYCHEW S.TVPHMURUICHEWaB«EBCO0M0APURMi«NDMaCHa*arAX)S77QBP FWINA 

STVCHEY B TVPHMUnUM Me6« OPERON (CNBIOTAXB), CHS GBIE T BO. OCV FW*»r 

STYCRP S.TYPHMUnnjM CRP 00* ENCOOMO CATMOUTI GB« ACTNATOA WMP FM«Pl 

3TYCVSB S TYPHUURBM CY8S GBIE. COMPLETE COS. ITMP m*n 

STYDACe 8.TYPHMURHJU DADS GENE BCOOMO ALANME RACEMASfi. I t40BP FM«P3 

STVaGHII SNJyUNEUA TVPHBAjnMIH-M GBIE BCOOMO PHASE I RAaELLAR14aOP PW7W 

STVPLGH2C 8.TVPM«URUIfLA0Ql.MCONTTWLB.aiBfT(W«ANOK>ae<. tOWSP FWOHA 

STYH2HM S.TVPWtlUniUMHlPUKiaiNANOHMaBCS,MaU0MaNVER8nN RIOM 

8TVHSOP 8.TYP»WURIUMHB0PER0NCCNTn0LREGnNPflCCEOM0MSOOe*.m4aP FWONA 

STYM3TO S.TYPmiuniUMAAarOBCSH*nOMETRANSPOffrOPERONL44«aBP FMONP 

STYLVPA S TYPfWUfUILVOEDAOPERON LEADER PEFTOS OB*. COMPLETE CDS. FMGNB 

8TYLEU0 SAUK»CUATYPHMURUHL£lJOGB*acaOB«OIPWaGMERASC.I0r4aP PUOPl 

8TVMETJ 8.TYPHMURUMHCTJ 08*. COMPETE COS, AND MVTB OB*. PAinUL. FMOPt 

8TV0MPA S.TVPHHURrUM OUTBt tMnw* PROTEM (OMPA) OB*. COMPLETE CDS. PW0P3 

STYPABA S.TYPHMUnUM PASA OB* GNCOOSO PARA-AMMOSENZOATE SVIfTHASfi W4BP FUJNA 
8TVPOTA SALMOPAIATVPHSMUUMPHOBPHOOLVCERATE TRANSPORT SYBTBI ACTIVATOR FWHA 

STYTTtPSA S.TYPHMURIUMTRVPTOPHMOPERONTRPe«TRPAaB«S.>O00SP nMNS 

TnTTMR T1PLA8MU}(A.TUMEFACeaXN0PMJ«STnAMT37.TMR GENE. COMPLETE FWONA 

TVWVIR T1FlA8MK>(A.TUM^ACIBe>N0PALMEffTnAHCS*,HOVLOCUSMVin FMOM 

TIPCm NTEaAATE0T1PLIiaM0(A.TLUEFACeMOCTOPt<STTUI4TRANSCAffT7 FMGP4A 

TVCT Tl PLASMO ( O C TOPME 5TRAM) FROM AOROSACTERIUM TUM CF A CCN S T-ONA FMCNS 

TFCTTKBR NTEOfUTEO Tl PLAMO (A. TUUCFACee). T-flSOKM .7 KB TRANSCRTT MBMA 

nPCTTW Tl PLASMO {OCTOPME STWMPTIACMIM A. TVMSF«OBft)TWGBC. PMTW 

TVHR Tl PLASMO PTMNC OCTOPME (A.TUMfiFAO0«). TUMOR MORPHOLOGY GB* PtlTNS 

TWMSt TTPUWOmAiNC OCTOPME (A. TUMEFAOB*) TMSg GB*. COMPLETE 

TV)M82 TIPLASMTOPTIAMC OCTOPME (A, T\iMtFACIB*)TMBI OB*. COMPLETE 
T1P0SD7 n>aHK)TI9TfUMT37 NOPAL WE SYNTHASE (NOS)GB«.tia IBP 
TtPTZS Tl PLASMO |A.TUk*FACB*: NOPALMB T37) TRAPeiSATM SECRETION 
T1PVIRC PLASMID PTMNC OCTOPME (A.TXMffACKNS>V»C LOCUS BCOOMO V«Ct 
TIPVIRO T1PLAS«OrTUMNC(FnOMA.TXMEFAC(B«)V«IDOPERONBCOOMOA 
TlPVmO PLASM»PTM(A.TUMEFACtBe)V«» GENE. COMPLETE COS. ITWSP 
TFMIOTCTH TRANBPGBONTNtOTSTRACYajNSRESeTANCE AND REPRESSOR GENCBTCTA 
TRNIMIST TRMSPOSONTNlMlBICCOt«aHEAT«TAaLE(BT>TO()aNiMMP 
TRN17S1RE T1VW«POSONTNt7»TWnGBCPOnRCBOLVASfi.7«IBP 
TRNIITNPM TnANBPOS0NTW1M0OULAT0RPfiaTSM(TNPI«GB*.RESaVASE(TNPR) 
TRNtlTNPR TRMSPOSONTTetTWIGB*. MtnP 

TRNI TRMSPO60NT1«fTNPA.TWnAN0BffTALACTAMA8EOB«S)>.4M7SP 
TRNUI TRMBPOeON TNUI UCflCURB KM RESISTANCE OPERON. U««SP 
TTMMIB PBEUDOMOMAS AERUOWOSATWMI GB* TNPA FOR TmNSPOSASB. M MB P 
TRNHIMER TAANSPOSaNTNMI UCRAOBtE BCOOMO MEftCURB REDUCTASE. IMTSP 
TRMOITNP T n AWSPOSONTNWI TWPWOB*. nmP 

TnNU4 TnMBP080NTNH4FnOMBJUjnCUS, COMPLETE. CGNrAMCTTUMPOSfTKM 
TMAl TRMBPOSONTMLBT|imVTB>N0CAT1NrTHaSCBRMTTWNSPOSASfiANO 
TTMSM TnANBPOaONTHiRIOKrPMERTEO REPEAT MTHGB* ran TRANSPOSASE. 



•nxIBOA /VPARROT/ULSTERrQ (HTNI). NEURMINIDASE (8EG •), 14S4BP 
I4RUEHZA AT ACMOW (HaN3}, HBMMMLUTMM (SEG 4). CQPUL ITMOP 
NPVUENZA *ALASKAW77 (HaN2). NONSTTtUCTURAL PROTEM (SEG «. MC6P 
MRUEKZA Art)ua«ALB£nfrAMn«(H1M),N0NaTnuCTUnAL PROTEN MCBP 
MPLUDOA /VTEmrAUBTALUV070&n (HI IW), NEUflAMMOASE (SEG •) 
MaUENZAMMMKOKrt/7«(HaN2),M1 M0M2 PRCTTEMStRPMSEG 7>, 
MaUBOA MANOKOK^m (KMQ, NCUflMiMOASfi (COMPUTS SEG •>. 
MRUENZA A/FOPTT M0NM0UTHnM7 (HINI). NONSTRUCTURAL PROTEM (SEG 
MaUBOA «FORT WARRENn/W (HINI). NOMTRUCTURAL PAOTEM (SEG •). 
MaUBCU MAPMO0M7 (HINa), HOMOOLUTHM (SCO 4). CONA. 
MRUBOA IMBiPm/%m (WNZ), HBIAOOLUTMH (SEG 4) RNA. 17M8P 

BAueoA Awr/w«MtC{»i)Nn. HEUAOdLurNN (SCO 4). complete 

MPLUGNU MMTtWU (HSNDlNEURAMMWASE (SEG •) WA. t447BP 
MRUBOA MfTM9M (KMQ, NUaE O PW CffE M (SEG t), CONA. IMCBP 
MRUBOA HMTmm (HaKQ. PaYMERASfi 1 (SCO O, CONA SMIBP 
MRUEKZA MHTmom (K»0). POLYMERASE t [SEG ») fMA C133BP 
MPUIBOA /VNT«WW tWKl). PaVMEHASS > (SEG I ). CCMA IM IBP 
MRUBOA iV S WM P W CW iERSEYn l/7t (HINI), HBMOaUTMM (SEG 4), 
MRUENZA AWEfCTO RIC0rtO4tCAMBRK>aO (HINI ). HEMAOOLUTMM 
MRUGNZA AfPUERTO RCCWH (CAMBRIDOEKHINI). MATRIX PROTEMS, 
PtfLUBOA JVPUERTO RCQfM4(CMBRnOE) (HiNl ), NEURMMMIDASfi 
HRMOa* «PUEfm> RCOrMHCAMRffiQE) (HINI), NUCLEOPROTEM 
MRUBOA JVPUERTO RCOrM4(CMSRI0aE) (HINI), HONSTRUCTUM. MOSP 
MRUBOA A*UERTORICOrVM(CAWBRK>OE) (HINI), PaYMERASfi lOilBP 
MRUBCA «PUCRTO RIC0fiq4(CAMB»0OE) (HINI). PaVMCRASfi t BSaSP 
PAUBOA ^PUERTO RCa«S«(CAM*ROGE) (HIND. PaYMERASfi > SMIBP 
MRUBOA A4ftM7 (HMD, NEURAMM«A8E (SCO «). COMA. MtTBP 
MRUBOA MFPWROSTOCWM (H7Nl)i HBAAOOLUnNM (SEG 4). CONA. 
iAUBOA MFPVmOBTOCKAA (H7N1). NOMTRUCTURAL PROTTMMOOP 
MRUBOA VT0KVOM7 (WMIL PCURAMMDASI (SCG <). COMPLETE 
MRUBOA AAJDORNTTX (HSPO), HATRU PROTEM (SCO 7), CONA. I037BP 
MRUBOA AiUD0flNr72 {H3W>J«u n AMilOA 8 E(SEG f) PNk. I4M0P 
MRUBOA WUOORPm (H3M). NONSTRUCTURAL PROTEN (SEG •). CIMA 
•AUBOA MOUCMJKRAMSrWI (MM), H Q IAOOLUTMM (SCO 4).C0NA. 
•AUBOA AAJSSMV77 (HINI), NEURAAMMSE (SEG ■). CONA. UtSBP 
MRUBOA AAJSSA«tf7r(HIN1). NONSTRUCTURAL PROTEN (SCO MOSP 
i«LUBOAAMCT0RUW7l(K»a). lOtAOaUTMM (SEG *), COMA. 
NRUBOA WVKTORIAOnt (HMD. NEURAHNKMSE (SEG t), CONA. 
MRUENZA AAWSNaa (HINI). HBMOOLUTMM (SEG 4), CONA. 177«aP 
MRUENZA AWrSMU (HlNl). NEURAMMDASC (SEG 7). CONA. t40MP 
MRUBOA AM8M93 (HtNl). PaYMERASE f (SCO T). COML tM I BP 
PAUBOA AMSHOa (HINI L PaYMERASE S (SCO I ). CONA. SMI BP 
■AUBOA BMNOKONOWTa kewoaUTMM (HA) (SCO «) RNA, COMPLETE 
FOiHA tAUBOASCEEMO, MBtAOai/TVM (BEG 4) ANA. 1M2BP 
POIWZ IAUBaA»L£&M.»OAAOOLUT*M (SCO 4), COMPUTE SCGMBfT.IMaSP 
F01M tAUBOAkiElMO. MATRIX PROTEN fCOMPLCTE SCO 7) ANA> I IttBP 
F01NA MRUBOA B4.ECMe,NCURA«M«ASfi«NB (SCO •)rMA1M7SP 
FOiW MRUBOA BA.EEMO. NUCLEOPROTEM (SCO «, COMPUTE SMMB^. IMlBP 
F01NS PAUBOAWLEBM, NONSTRUCTURAL PROTeN(SCG R, RNA. lOMBP 
FOIHA tAUBOABCREG0N«M,HaMO0U/TMN (HA) GB* (SEG 4L COMPLETE 
PO«NP MRUBCA B*MGAPOR (Via Sff» NUCLE U PR U T EM OB* (RNA SCO ». IHBP 
KSHA MRUBCA CCAUPORNWTl, HEUMrtOLUTMN (SCO 4). CCNA. t07lSP 
FOMS MRUENZA CCALIFOnpSAmNB PROTEM (SCO A COMPLETE. KMSP 
FW3CV3 FROG VMUS 3 M«IATS«M.V CP U GB*. OMP 
OPMOVI P»«BBRABSCMGRANUL0S*vnuS(PROV)0RANUlNGB*.M«P 
OROiMD SOURRa iCPATmS WniS (GSM4. COMRETE OCNOMfi. aSIIBP 



fWUNA 
FMVHA 
PMYNA 
PWVP1 
FMYPa 



TRN7AA0A TRMBP0S0NTN7AA0A0e< BCOOStOSTRCFTOMYCN ADBfYLYLTRANBFERASE.GVrOG TRCHOPLUStA M GRANULOS* WtUS (TNGV) ORANULN OBC COMPLETE 

TWfTFa TRM«POS0NTN7 TYPE. I OSfYORORXATE REDUCTASE OB*. MSRP MAM* HANTAAN VIRUS. COMPLETE M MM SEOi^ COOMO FOR 0 1 AMI MM HSP 

TRNMORS TRA»«POS0NTNM)BWBITE0flCPtAT«KMMfV.RC8«T.OEI*rBC. HAMNC HAWTAM VWUS S SEOMEKT BCOOMO NUCLEOCAPSID PROTW 1MMP 

TRWI7 TRANSPOSON TWi1T(COWUTE^,MWRCU0E^JN C 0SAMC E •STREPTO0 n AMM B ISOO OUOC HEPATTT* B VIM. COMPUTE GBCME. MlBP 

TWKAT TRN«POS0NTN»wrTHCa0RM»HBSCa/CETVLTRMSFERASEC0S.lt4»P MVARV2CO HUMMA«)S-ASS0CWTEORSTRMRUS(ARV«,CCMPLETEPR0W«WLO^^ 

TmCAU2M TTUNSP0S0NTNCAAMMCM.0RAMPHB«caTRANBACETYlASCaB«.t2lBP WBRUCG HUUMLYMPMAOBCPATMV^ASSOCUTn vnU8(BRU ISOLATE), COMPLETE 

T7M.EUB TTWEnMOPHLUS LEUS GB* COOMQ FOR >eO>R0PYLMM>TE DEHYDROQCNASE HnCDC4t HUMMT<eiLYMP»CTROPe VWUS TYPE ■(C0&4ei),r4.TR AND GAG 

VCHCTX VSRK>CHaERATOXN(CT1Q OPERON OB*. KN8P MVC0C4* HUMM TCaX LYMPtCTROPC VWUS TYPE ■ (C0&4A1). TAT OB*. B#V 
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HtVOSU HkM*MTCaXLV1UVMCrnV)PCvmuSTYPeM,SOaTAT.WTr.AN0>7K 
Hn^ICO HUUW LYWPHAOmOPATHY VmUS (Rl SOLATE). COMf>LrrE GBKME. »1 MBP 
HfVH3DH» HUMAN TXaXLYWPHCrmOPtCVIftUSTYPC II. SOLATCBHS.OAaPa 
HIVHMHS HUUANT-CBl LEUKEMIA vmuSTVPC III AO8k«0LATCBW.EHVOeC 
MVKICO HUMN T^CaX LnVHOmOPtC VlftUS TYPC 11. COIfflETC REFiRBCE IT««P 
MWOOCO HUM* T^BXlVWHOmOfCVRUSTYPfim (Hit. COMPLETE O0IOMK 
HHftoezCO HUMM^T-CaXlVMPHCmtonC VIRUS TYPE M(H]e2),COMR.CTEa»0MC. 
WVUM.CO HUMAN LVMPHA0O<0PATHVVnU8tMALIS0LATO. COMPLETE O04OMC.«2«0P 
HMVe4V HUMWLVUPHOTROPCVnUSTYPCMtRFtaOtATO.OfVaeC.MnBP 
HTVMAJS MUMMLVMPHOrnOPIC VIRUS TYPE n.«MII»GUTE.mv CM. 2SMaP 
WV2I MUUANT-CaLLyUPMOrn0PICV«UBTVPCUi(ZAJREliaCLATE>.8OR, 
H.1PnOP HUMAN T-CBlLCLKBMWVnUSTVPC I (PROVmAL), COMPLETE OOCMS. 
H.1PX MUUANTCajLLGUKB«AVinuSTVPEl.PKaeCMf9M(t.tKS),IIMBP 

Hjioa HUMJW T-cai leukemu vnus type ■. complete pnownAL oshoue. 

it.2BN HUMJMTLVMPMOmOPtCVtRUSTYPGII.B4VGaiE. 14«4aP 
WAACG HUMAN HEPATTTQ A VIRUS, COMPLETE Q0«OM6. 747aeP 
HPQADR HEPATmS S VIRUS (SUBTYPE ADR), COMPLETE GB«OME. StUOP 
HPBADRCO HCPATni8BVnuS(8UQTYPEAOR).COMPLETEOB«OME.aCNCAWU. 
HPBADW HEPATire B VIRUS (SUBTYPE AOW). COMPLETE OOtOm. 3200BP 
HPBAOyW HUMAN ICPATTTQB VIRUS SUBTYPE AOYWWnGa4 0e<£Sf743eP 
HPBAVW HUMAN KEPATTTO B W1US (SUBTYPE AVW). COMR.ETE OSMMC. SIUBP 
KPB»eSAO HEPATTm B VIRUS (SUBTYPE AYWtSURFACCWnOEN M S AO aENE.«2ieP 
KPBSAO HEPATTTS B VIRUS SURFACE ANTia04 MAJOR PRGFTEPC PI ANDPS.IHfiP 
HRSlCE HUMAN RCSPtRATORY SYNCVTIM. VIRUS QMS IC. IB, N. P. W, I A. 0. t. 
HRV HUMAN RHMCVIRUB TYPE U (HRVU), COMPLETE OENOME. niSP 
HRVCO HUMW RHNOVnuS TYPE 2 (HRV2), COMPLETE OBOC, 7t03BP 

HERPES SMR^CXVIRUB TYPE tO»S FOR >MMMPRGrrE»42«0«6P 
HS1733 «V I (K08) 746 MAP UMTS ettCOOMQ RJSION FUNCTION. K41BP 

HS1 ATV HSVt (STRAM F) M.PHA TW^ MWCtM P>VT0H, COMPLETE COS 2UIBP 
W1EX0 HSVt (KOS) MJUH.iC EXONUaEASE OWE, 0 17M.IMMAP UNTTS. M71BP 
HStOa KSV1 0(O6| OLVCOPROTIM B OENS, COMPLETE COS. STMBP 
WIOBCP HSV1(STRAMF)OeAN0ICPtlJ0S«SENC00BMOLVCOPnOTEMB,AN0A 
HSIGC HSV-1 (KOB) OLVCOPROTEM C GM. HU ftM^Aft. 2M7BP 
HSIOCM HSV-1tffTRAff4 MP) DEFECTIVE QLYCOPRaTEHCaENE,UU0 134 «. 
HSIOO HSV1GLYC0PROrEMD0B«. IMMP 

HSiOOa HERPES SMPLEX VIRUS TYPE tOLVCOPnorrEM^OOBC. COMPLETE COS. 
ieiEM6a HERPES 6MPLCX vnus TYPE IMMEDIATE-EARLY MmA4,C0MR.CTEGBC. 
HSI PCL HSV t 9(00) ORH. REOKM ANO POLYMERASE (POL) OM, COMPLETE COS. 
HSIP0Lt7 HSVt(l7)ORI.REOIOHPOLVMEnA8EWODNABff«}MaPnOTEMO0<ES, 
HSlTKWa HERPES SMPLEXVUUSTYPClTHYMOMEKNASE AM) 3KBLOe«S I40aBP 
H31 US HSVI (STTUM IT) COMPLETE SHORT UNKXJfi REOKM WTM PARTUL ItMBP 
HS»XO HSV 2 (Ha-tZt MXALMG EXCNUaEASS ODC. O.tn-O.tWMAP UMTB 
HS2QCtW HERPESSMPLEXVtftUSTYPEtQLYCOPnOTEMCiVAtKPnorrEPiaENES. 
HS300 HERPES SIMPLEX VfflUS TYPE t (HSV 2) aLVCOPAOTEM 0 (00-7) OOC tHD 
HSZOOa WRPE8 SMPLEX VIRUS TYPE 3 OLYCOPftOFTElNO OBC, COMPLETE COS. 
HS20F HSV2 (STRAMO) OLVCOPROTEW FOeC.COHPLETE COS Z3nBP 
H52Mnn HERPES SMPLEX vnUB(HSV> TYPE I TRM«FOfWNaREOI0NWRr.ri*2«BP 
HS3RR3MU) HERPES SMPLEX VnUS TYPE t (HSV-S) RSONUaEOnOE REDUCTASE MD 
HS71K HERPES SMR.EX VIRUS TYPE I (STRAM 333) THVMIDMKNASEOOIfi, 
HS4 EPSTEK«ARflV»tUSteBV)BM4STRAM, COMPETE OSiOME.ITmaaP 
HSAGai EP9TENBARR VMJ9 BIM ECOfll DHET TRANSCRtFTS FROM EOlt, M«(8P 
HS404VaP EPSTEt* BARR VIRUB MAJCR OUTER CMVaOPE OL VCOPROTEN OBCS 0P3M 
H84MP EP8TEM4ARR VIRUB, Pl^TATTVC LVDMA Oe«G. StOTBA 
HS4NA2I EPSTEWBARR VIRUS LEADER PROTEH (LP). COMPLETE CDS, MO NUCLEAR 
HS4NAt EPSTEPiBARR VIRUS NUaEAR ANT1G04 1. 1 1StBP 

H34U2«a EPSTEMBARR VIRUS (Aai7ltS0UTE)UimD0MAWBC00MaNUaEAR 
H$4UZ«QA EPSTENBARR vnus (BW-a ISOLATE) UI-(R2DCMAiN»COOt40 NUCLEAR 
HSftlERt HUMM CYTOMEOALOVKUS (STAAM TOWNS) REOION t tMEDlATE EARLY (IE) 
HSftiaiW HUMMCVTOMEOALOVmUSIRLREaiONe4CODNaA30KDPnOrTEN,tt7aBP 
H34MIE4 HUMAN CYTOMEOALOVIRUB fTOWC) MAJOR UMEDUTE EAAV (IE) 0»E. 
HSSOGtl PSEUOORABIES VIRUS (PRV)aLVCOPnOTEM on OOC. COMPLETE CDS. 
HSSTS HERPESVIRUS SAMAl THVMDYLATE SYNTHASE G0C. COMn.ETE COS. 33MBP 
«TTK MAHM06ET HERPESVIRUS (HARNV) TMYMDNE K»U8fi OM ANO FUVWS. 
BADi AVUMPlFECnOUSBRONCHTBvnuSV/ttCMSTnAffllMIMIMADI'BO, 
BBSPI CORONAVIRUB BV OOC FOR SPKE PROTEM PRECURSOR 3MBP 
ISVNO LASSA vnus (STRAMOAMtlNUaEOCAPSnoeCCOMRETE COS. lUOBP 
UAAf«U3A ALFALFA MOSAIC VIRUS (ST1UMS)f«IA 3. P3 WD R«PROrEM8.»«6BP 
MAARNAt ALFALFA MOSAIC vnus (STRAM4»LEDei)RNA 4 ENCOOMB VMM. COAT 
WOCOtl BEAN (KX0e4 MOSAIC VIRUS, ONA I OF COUP|JETE0B<0ME.tM«P 
hSOCOZZ OEWOGLDCN MOSAIC VIRUS. ONA a OF COMPLETE 0»0ME.2U1BP 
lARCOlZ BROHC MOSAIC vnus IMA I OF COMPLETE OB«0ME.3ZMBP 
l«AeO«Z BROME MOSAIC vnus RNAt OF COMPLETE OENCMC amO P 
IfiRCOlZ BROME MOSAIC vnus (RUSSIAN 8TRAM)nNA3 (ANO nNA4)w OF tltlBP 
MCAIMI CAUUROWCR MOSAIC VRUS (STRAMCMtMIL COMPLETE OENOMi. MSlBP 
HCACOOH CAUUaOMCR MOBAC VnUB (ALTERED VIRULOCE BOLATE OM), COMPLETE 
UCASTRAS CAUUFL0WERM08AC vnus (STRASaOUnOSTRAVS, COMPLETE ae«WE. 
HCCfVMS COWPEAdCOnOnC MOTTLE vnus IMA 3. COMPLETE COAT PnOTEW COS 
UCFENVA MM( COl FOCUBFORHMO VIRUS (MCF-MULV. CM 60LATE), PROMM. 
MCVfMAlC CUCUMBER MOSAC vnus 10 STRAM)RNA I. C0Un.ETE.3inBP 
MCVRNAK CUCUWER MOSAIC vnus (08TnAM)f9(At COMPLETE. 3aa6aP 
HCVfMAS CUCUWER MOSAIC vnus (OSTRAMDRNA 3, COMR.ETlSGOM0fT»l«3BP 
UHVTM MOUSE ICPATmS VIRUS (STRMN JMI) NUCLEOCAPSO PflOTEN OOC, MflNA 
MHVEIN MOUSE HEPATTTB vnus UHVAM. El MO NPA0TEMS.M41BP 
hLAPAO ABCLSON MURNE LEIKEMU VIRUS (PAOWRM.). COMF^CTE OS«ME HMBP 
kCFFXBfV FReiOMNt cat FOCUB-NOUCNO VIRUS (FRIMMCF.ffmAMFRNX) 
hLFRO FRIM SPU» FOCUB^ORMMO MRUS (PROVnAL), COMKETl OB40ME. 
kCFVPetV FRSN0SPLEENF0CU8-F0RMN0VnuS(8FFV^O(Va0PE08«, ITSIBP 
hCM MOLON6YMURMLEUKBllMVnuSCOMR.ETEO0*OMC.U33SP 
a.RB«V RAUBCHERSR.EaPOCUS-FORMNOVnuS(PnOVIRAL)ewOENEMtOrLTR. 
kCVBIVPRM MOLONEY MNtC£UFOCUS-F0RMNaPR0VnM.gwOENC ANO LTRawOP 
U.VQ«VR MULVtflrmAJNRAOLVnurr*L*nGNV OENS. COMPLETE COS, ANO TLTR 
M.VRENV RAUSCWR MM CaL FOCUS FOnHNB vnus eNVG8« AND TLTRmTBP 
lyLXDIVXA HURffSLElMaMVIRUSN2B4-IX0iaTnOPCPAOV)RALONA.POLANOe«V 
liMTB4V MOUSE MAIMARY TUMOR VIRUS MRNA. SftiaBP 

MfTBWOR MOUSE MAMMARY TLMOR vnus Pn0VnAlOIVH.0FE0eC REGION. 30I2BP 
IMTLTR M«TV (OR MOUSE UAHMARV TIMOR VnUS) LONG TERMtML REPEAT ONA. 
MPVTK MOMtEVPOX vnus THYMUNE KINASE OeC.IITaBP 
MSHP91 HARVEY MURME SARCOMA VHUS Pn V MAS PRCTEM GBIfi. W7BP 
MS»V»tA MURM SARCOMA VIRUS (HARVEY-STRAVSH^RASOBC FOR TAAN6FORMN0 
MSJMUSV FBJ MURME OBTEOSAncOMA vnus (PAOVnM.). COMPLETE Oe40ME.422«P 
MSKP3t KnSTQI MURME SARCOMA VIRUS PS1 V-KQ PROTEM GQC. MMSP 
MSMA08M1 HOLONEYMURM SARCOMA VIRUS (P«*av»M.)M0SM10e« ANO RJMKS. 
MSMPROCO HaONEVUURtC SARCOMA VinuS(PfK>V«M.), COMPLETE OeiOHE.S«aBP 
MSRMUSV FBR MURNE OSTEOSARCOMA VIRUS (PROVMN.). COMPLETE OMME. 311 IBP 
MSV MA12E STREAK VIRUS, COMPLETE OaiOME.mTgP 

M5VRAS BMAC MURME SARCOMA VIRUS H-AAS RaATED ONCOOM. COMPLETE CDS. 
MSVM08 MYELOPnaiFERATIVE SARCOMA VIRUS PROVIRN, V-M06 OBC, T LTR 
MTGA TOMATO GaDfiNMOSACVnU8.COMPON0fr A OF COMPETE OeiOME.ZUiBP 
MTRRt TOBACCO RATTlEVnuSfMA-X (CAM flTHAM) FOR C AP60 PRCTTf M. tTWBP 

yraoKCf TOBACCOMOSAicvnusicoiwPEArmAHJOKi coat prottem genes. 

WrVCO TOBACCO MOSAIC VIRUS, TOMATO 8TRAM 04. COMPLETE GBKME, OMBP 
MTW TOBACCO MOSAC vnus (STRAMVULaARO.C0MR.CTEGa«0Mfi.«3MaP 
MTY» TURMPYaLOWM0BAICVnUBOa*0MCt»CAVY)rMA:3'«O.MP 
MTYCOAT TURMPVEaOW MOSAIC vnus COAT PROTEM MRNA, fl»46P 
NPAP10B AUrOGRAPHA CM.FOfMCA NUaEAR POLVHVDnOSa VnUS Pt« GENE FOR 
NPBPH NUCLEAR POLYMEOROSIS VIRUS (OF SMORQ POLVHEORM GOC. 30MBP 
PCS 8MUM SARCOMA VIRUS (PROVnNJ. COMPLETE GBIOME. S771BP 
PC82 SMUM SARCOMA vnus (PR0WMj.TTVMSFORMMa REOKM. 2337BP 
PCSFMA PCWOE ARBMVinUS SMALL (S) IMA, COMPLETE. MIfBP 
PCSIMM P.ARB«AVnuB8fMATM).COOMOPOflTHENPROTEM.3O40BP 
PVNP HUMAN PARAMRUOCCA 3 vnuS NUCLEOCAPSD PROTTEM (NP| GSC. t MOBP 



P1FNPA HUMAN PARAnauaUA VIRUS TYPE 3 NUaEOCAPSIOPROTEN. COMPLETE 

PTPCD HUMAN PAAAMRUENZA 3 VIRUS MRNA »COOM1PJMDC PROTEINS. 20I3BP 

PLXO XPOLYOMAVIRUSfJCV). COMPLETE OeiOMG,ftt30eP 

PLV2 POLYOMAVIRUSSTRAMA3tMOA3)CCMPLETEOO<>MS,U*7BP 

aY3 POLYOMA vnuS 8TRAM A3 COMPLETE O0OIE. taMBP 

POLI PaiOMRUS TYPE I 0UAHONEV STRAM). COMPLETE aeiOME.7«4«BP 

POL&W POLXMRUS TYPE >{UWSMaSTnA»4, COMPLETE OOIOME.M37BP 

P0L3LIZ PCLK»nU8 TYPE 3 LEON 1ZA-t«(SAaMVACCffO. COMPLETE 0»OME. 

P013L37 PCLOvnuS PVLE0M97 (TYPE 3), COMPLETE Oe«0ME. 7431SP 

PPHII HUMAN PAPSLOMAVIflUS TYPE II (HPVI 111 COIfflETEa0IOME.T«O4BP 

PPH1A HUMM4PAPtlOMAynUSlA.COMPLETEOeMME.7IIIBP 

PPL LYMPHOTnOPCPAPOVAVnuS<LPV),COM>LCTEaOK3ME.U7tSP 

PPMVPt MONKEY B4.YMPNOTR0PICPAMVAVnuSP8n a FRAOMefTBCCOmaVPI. 

MONKEY BLYMPHOTROPC PAPOVAVIRUS MUTlMT (LPV -I*) PST1 B 2364aP 
PUKrATOROP»UiOvnUSSnNA.COOWaFOnN ANDNSPROTENS. IM4BP 
PVI PARVOVIRUS H-l. COMPLETE ODIOME.StTttP 

P\«OUN HUMAN PAPOVAVnUS SK. VARUMT OLN.OP, COMPLETE OBIOME. ft1B3BP 
PVBMI HUMAN PAPOVAVnuS BK. VARI WT MM. COMPETE OOlOMfi. 4M1BP 

PVFVP FaME PAr«.EUKOPesA VIRUS STRUCTXjnALPRorTEMoecs.vpt WO vn, 

RAVOP RABC8 vnus (ERA STRAi4.aLVC0FR0TEN MIMA IMIBP 
RAVMtGN RABCS VIRUS M>. Ml, OJMONOeiES. COMPLETE COS. MO LOOIS. 
RBFTK SHOPC FBRQMA vnuS THYMX)ME KNASE GENE, COMPETE COS. 1 1ISBP 
REOOSlC REOVnuS SEROTYPE 3 St OOm BCOOMO SIGMA I PflOTEM. COMPLETE COS. 
REOSX REOvnuS SEROTYPE 3 83 OOC. NON-STRUCTURM. PROTEM SIOMA NS. 
RE0S4 RSOvnuS SEROTYPE 3. SEOMBfT 4. MAIOR SURFACE PROTEM 8 IQMA.3 OENE. 
RHUAarr human ROTAVnuS (HUrAUSTRALMWTT, SEROTYPE 1). COMPLETE 8EG I, 
nOatO HUMAN WA ROTAVnUS OeC to FOR NONSTRUCTURAL OLVCOPROTEN NCVPft 
Ron HUMAN WA ROTAVIRUS 0»iE I FOR SEROTYPE -SPEC FX UtVOGi VP7. IMSP 
ROeSfiOA MMM WA ROTAVIRUS SEOMOTT « fMA. VPt CM. COMPLETE. 136«8P 
R0B10 UK BOVMS ROTAVIRUS RNA oe< SEOMQifT 10, COIAETE COB. TftlBP 
nOBi t UK eOVME ROTAVnUB RNA QBC SEGMENT 11, COMPLETE COS. M7BP 
ROBG UK BOVME ROTAVIRUB SER0TVPC SP6CIFC GLVCOPROTEN Oe«. tOUBP 
BOVtC ROTAVIRUS (RF8T1UM)SEaMefT«. MAJOR CAPSO PROTEIN, 
UK DOMNE ROTAVnUS SEOMOfT » COOMO fOR NONSTRUCTURAL PROTEM 
8MVM 1 1 ROTAVIRUS 0B« 10 FOR NCNSTRUCTUWL OLYCOmOTSM NCVPI 
SMIANIt ROTAVnUSSEGMBfr I RNA. VP! GENS, COMPLETE. l3ftTBP 
SMMN 1 1 RCTAVnuS GENE 7 ROR NONSTRUCTURAL PROTEM NCVP4. t KMBP 
SMAN 1 1 ROTAvnuS GEW I RQR NONSTRUCTURAL PROTEM NCVP3. t06MP 
8MUM 1 1 ROTAVIRUB MAJOR OUTER CAPSO PROTEM VP7 {SEG «). CONA 
ROWPTNCD B0VINER0TAVnUS(STRAMNCOV)8URFACEANTiaDlVP7GCNE. lOKiSP 
R0WP7S2 ROTAVIRUS (HUMAN SEROTYPE t SEGMENT f) VP7 SURFACE ANTW&I. lOISP 
RSVRAS RAT SARCOMA VIRUS (RASV) V RAS ONCOGENE, P» TTUNSFORHMG PROTEN, 
8FVZ SM.KI FOREST vnuB MS RNA MO JUNCTKMREGKMAaZUP 
8M SNOBIS VIRUS (HRSP MD WflLD^TYPE STRAMS) COMPLETE G»OME. 1 1TOBP 
SrVMPCO SMIM MASON-PFBER frTYPfi RETROVMUS (MPMWiA), COMPLETE O&IOWE, 
SIVRVICO SIMM SRV I TYPE 0 RETROVIRUS MT.*), COMPLETE GOCMfi. IITSBP 
3N0FZ 8B«AI vnus (HWJ)F MRNA DiCODMOFUBlONOLVCOPROTEM. II I3BP 

SOCAI WRUB 1 KM A QO LUTMMNBU R AMMIOA B E OLYCOPROTEM MRNA. 



ROT010 
ROTO* 
ROT07 
ROTGA 
ROTVPT 



SN0V1 



SeOAl VIRUS M (MATRIX OR MCMBRME PROTEM) OENE, COMPLETE COS. 
SENDAI vnus MRNA FOR PROTEMS NP.P, C, M ANO N-TERM OF F. MMSP 
8N0WSH0E MARS vnus, COMPLETE M fMA SfiOMGNT COOMO FOR 0 PROTEM 
8V4CG SMMN VIRUS 40 COMPLETE GENOME. IMSBP 

8VWMP SNIM vnus A HEMAQOLimNMNEURAMMDASE (IM) PROTEM. fl236P 
8VSPFC 8MUN vnus I. PROTEM F MRML COMPLETE COS. II73BP 
SVCPM SPRNO VIREMM OF CARP VnUB STRUCTURAL PROTEM H. MRNA 71 OBP 
TSVCGHAT TOBACCO ETCH VIRUS (HKM.YAPM)TRANSMt8SSLE (HAT)) COMPLETE 
TNS SATELLITE T0BWCO*CCROSavnU8.C0MPLETSaa«Mfi. I XNBP 
TOTRVRNI TOBACCO RATTU vnus (TRV|RNAty.TERHMUS.222lBP 
TSVRNAa TOBACCO STREAK vnus (STRAMWC) RNA 3. COMPLETE SEGMBfT.QOUP 
VACB<VMT VACCI«AVIRUS(HMDIIFFRAaMB<T)Pa7KOeCBCOOMOMB>IVB.OPE 
VACHA VACCNU VUUB HEMArtftLl/TMM GBC. COMPLETE COS. 1M3BP 
VACHMOOO VACCMU VIRUS (8TnAMWR)HMDnOFRAaMe4TONA,COMR.ETE. 
VACMCPMB VACCMM vnus P4B(MAJ0RC0RS POLYPEPTIDE) 0B4E, COMPLETE COS. 
VACMLO VACCMM VIRUS MAJOR LATE QOC OCOOMG A HK PnOTCM. COMPLETE 
VACNTP VACCMIA VMUSNUCLEC AC«>-0EP9«CNTNUCLE06OC TR»N0SPHATA8E I 
VACPao VACCMIA VmjB ONA POLYMERASE O&m, COMPLETE COS. lOMBP 
VACPaO VACCMUVnuS(STRAMWR)HM0»LMOJFfU0Me«TS&ICOOMa 
VACPaR VACCMIA vnus (STRAMWI^HMOII J MO HFRAaMetTSENCOOMO RNA 
VARTK VARKXJl VHuS TKVUOME KMA8E OBS. lf74aP 

VAZmtS VARCEUA-ZOBTERVnuSOeMMFHAaMefTWITHMAWRNVERTEOREPEAT 
VAZUS VAI«CaXA-Z0STERVIRUB(VZV>UNnUESCOUe«CE(US)COMPONefT.MMSP 
VLVCG VI8NAl0fTTVIRUB(PROVIRM.ICaMDiCSTRAMtftU) COMPLETE OBIOME, 
VSV VESICULAR STOMATTTIS VIRUS (MOIANA SEROTYPE) COMM.ETE OBHOME. 
VSVOPORS VESICUARST0UATTT«VnUS(0flSAY>aPROTEMMRHA.COMI1.ETECDS. 
VSWja VESICULAR STOMATTTtS VIRUS POLYMERASE (LJOB«.COftVL£TE COS. 
VS>MtJ VESICULAR 8T0MATTT1S VIRUS <VSV-HI) MATRIX (M) PROTEM, COMPLETE 
VSWMI VESCULAR STOMATms VIRUS (NEW JERSEY) N PROTEMMfMA. COMPLEH 
VSVNS VESICULAR STOMATms vnus (NEW JERSEY) NS PROTEM MRNA lUSP 
WW WOOOCHUCK HEPATTTTS VMUB, COMPLETE OENOME. XM«P 
WHVCG WOOOCHUCK ICPATTTB vnus t COMPLETE GOKMCSSaOBP 
WMVSAO WOOOCMUCK HEPATTTU VIRUS SURFACE MDOD* GENE. IMBP 
WNfCO WSST NU vnus MA. COMR.STE OSKME. lOMOOP 
VFV YBi.OWFEVEflVnuS,COMR.ETEOe«OME.lOM3BP 

BACTERtOPHAOES 

AL3J aACTERMPHAOEALPWUSJOM.lttBP 

BETDT CORYtOICTERnPHAGE BETA |C.DF*HTHERUE) DIP»mCRIA TOXN (OT) GENE 
BETCTTZn CORY^QACTERCPHAOE BETA (C.DIP»mCRtAE)TOXm ANO R>M(8,22a«eP 
KECG BACTERnPHAGE KE, COMPLETE G9I0ME, UiaBP 
KEOVP BACTERIOPHAaE KE ONA BMOMG PROTEM (OVP) GOC. COMPLETE COS. 



L^WMBIllI BACTERIOPHAOfiLMIBOAMAt3«ECORMFRAaUB4T.ia7BP 
UtS BACTERnPHAGE Ut3. COMPLETE GENOME. ««07BP 
HSa aWTERIOP»Mafi USX. complete OeOIE. 36«I6P 

P1MC BACLLU8 8UBTLB PHAGE PMIMiMUNTTV REOION MTH REPRESSOR GENE 
P21NRI aACTERX>PHAOEP»»XtN REOION BCOOMGTRMSCRVTKMIMBP 
P22CS aACTERnPHAaEP22C2RCPAESSORG»E.MlBP 

PZ2ERF BACTERK3PHA0E PB ESSOfTIAL RECOMMATCN FUNCTION (ERF) G»E. 
P22MMt BACTEfllOPHAOfi Pa MMUNTYRCOKM (MB): ANT oeCCOOMO FOR 
P22PAC BACTERDPHAOE PTt TERMMA8E 0B« O PROTEM), COMFLEH COS. «3BP 
PzarnOP iACTERIOPHAafiPaRnHTOPEAONOMCl.flCPLICATIONaBCBIIANO 
PMPt BACTEflnPHAOEPM-XIOENESI.IOANOtlENCODMOPITAL.aoaP 
PMa BACTERMPHAGE P»»40 Ct SMMTTY REOION ENCOOMG THE N OOIE. ITTtSP 
PA2LC BACTERKlPHAaE PArlLC OM ENCOOMO OUTER MCMBRME FORM PROTEN. 
PFIC BACTERtOPHAOEFI. COMPLETE OmOME.MOTBP 
PFID BACTERIOPHAGE PflOeCENCOOPIQ ONA BNOMO PROTEM 43WP 
PFiOITS BACTERIOFHAOGF109C8V,VIIANOVII,71IBP 

PRBPMCP BACTERIOPHAaE PRXP.AERUOMOSAtONABMDMG PROTEN ANO MAJOR COAT 
PF3MCP PHAGE PAUAIORCOAT PROTEM GO*. COMPLETE COOMO SGOUENCE. 4 ISP 
PFD BACTERKVHAOE FD, STRAM 471. COMn.ETE GOWMC, I40IBP 
PO* BACTERKIPHAOE OA, COMPLETE GBIOME. W77BP 

PKITHVPO BACTtflK)PHA0EPHI-9TTHYP3 0»E ENCOOMG TWMIOYLATE SYNTHETASE. 

PMUA BACTERMPHAOE MU OENE A ENCOOMG TRAWPOSASfi. COMPLETE COS, S4(«P 

PMUB BACTERWPHAGEMU PROTEM BOBC. COMPLETE COB, MCBP 

PMUGMAOM BMiTERIOmAaCMU BETA REOKM MOM ANO OM GENES AM 3* njkNK,1M2aP 

PWUMM BACTERtOPHAaEMUM«UNrTYRCaiaN(L£FTQ«)OFGENOME),1HEBP 

PPI CM I eACTERMPHAGE Pi CM G0C BCOOMO flSCOIAMASE. CIXL RfiCOMBNATKM 

PP1REP BACTERWFHAOE PI PLASMP R6PLI0ONUM0DA-P1:tRHCW.ICAT10N REG ION 

PP4a COLIPMAGEM EARLY ANOLATE REGIONS. 3I1IBP 

PT3Pa BACTERIOPHAaE T3 GBC I BCOONG RNA POLYMERASE. COMPLETE COS 
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PT^aOT aMTEflK)MMlfiT4DCTMnaaWBCOC»aUTAAUC0SYlTIUMFERASC. 
PT40ENV UCn(U0fHAOBT4OeWOa«B«C0C)M0BID(MJCLIWV,CCMnj^ 
PT40CXA lACTERMflWOf T4 0EXA0ENCCaUPLCTf CO«.tXMar 
PT4C BACTEftlC9m*aiT4|Oese«OOMLV»aZYMG.CaM^C()*.M«P 
Pr4mOTDl ajCTERnPHAOf T4FR0MTDOO«eC0l)M0 0»m)R0rajkTEft^ 
Pr4HUm» UCTBU0fHWET4TDaBCOC00»MTHVW0VUTf SVNT>U8C.KX0Nt. 
(>T4aiTmA BWTEfVOPHAOS T4 OBCS I , S7A, STB. PI. MO TflM aUSTERa. «01 BBP 

pr4aaa23 ucrrEf«on4AacT4oecsa(PAimM4iM)a(COiy»UTSco^ 

PT4O30 BJOERIOmMWT4O0K3O84COC>MODMAUO*SS.COMnxrSCO8.IHaaP 

PT4aR UCTEMOmAUT4 0S«Ja8COOi«SMIMHNOfWraOTEIt t340W 

PT40)M»7 MCTEf«OmAO8T4O0CtM««37|COMPUTEfc9»JWOM(PAmu^ 

PT40UaU MCTS«0PMA0CT4OeC8H,M>WU>r.4r,4«.4t,44,tt.nCOA. 

PT40CM£ tMTnomAaCT400«a(BCOONai»UU(U8C)«ON.C.COI#LETC 

PT4ai7(tt1 BMTB«0nMaGT4O»E8t7.MWDt1.n7BP 

PT4PSCT PH<MT4Pftn0e«F0ftP0LVWICLEOnKKP«k8CtMC) 

PT4UVBV MCTB«On4M»T4UV8VOCI<iWaVCDMTmReCOW»4ATKm PATHWAY. 

FT4UV8y80 BJCTB«0mA0CT4aBC8UVSX.4l.«tM,OAM,MiMDa0CBCO0Ma 

FTtCXOO BACTB«OPHMHinDtftO»f F0RVexGHUaEA8S.IMW 

PTSTRDO BACTCnOn«AQCnae«PORTIMA48P(OUC».Tt4af 

PTT SACTEfMPHMlt TT. CCUKSTI 08IOIIf.M«MeP 

PmO BACTEI«0nU<KT7aae8M.l.lU{PO6m0N»M,4M-W.mt,JINO 

PX1 BACT»«OPM«OCPHI-X174,CaMn.CTEOO«OMC.ft3MaP 

SPIimm BAmA(OMAOCSK>l^.8imUS}08Cr.MCCUHI0UOrBt.AW 
ftPtftO BACTB»OMUaiMOt^tuanift»OB«SniM>MGNCOCMNOMaUA 
SPtTVl •ACTCmOPHAOCtMITMMacnFnCNMCTOntfTFUOM.OOUnjTi 
8P2P0LQL BACT8»0mMC«^0B9.UanjS)0MLOC00M(MPaviCIUM. 
SnO BJWTB«0PHWSV«CTA«.SUVTUS)0OO«,OC0OfiaMMUMrTVI0^ 
SPCSMC 8MTCRKyHMCftmi«<mW 8. AUREUS) SWOBSCOOMORJRWTBF 
SPniTASE BACTBIt0mAa€SPn(B.ftUBmLB)0HAI«TMVlTRAN8WU8SaaC.tia^ 



vBCwmor vcAST<«jccftsv«tAOifr»i«flmAaMNnointccNrAM«)wo(V. 

VSCtmoSA VEA8T(S.CCnEVt8t^urrOCH0NDnuCS£ll-TmAAI«ATPAa&PfCrrEOIJ>ID 
murvAfll VeAflT(rCEI«V»IAC)WTOCMONO(WiVAmOBC.40jOAUaS.tl008P 
VnJinai VEAaT{TjaLAWUTA)ltfrOCHaM)fWLOHAIC1WE9«CYTOCHROIieOI^^ 
WmCVM YEAST 0 POME) lOTOCIWNOmAL COS OMBCOOMAfOCVTOCNnOM 



HUM HUWN 



CW CMCKBI 

xa xaioPusiAcvn 

mo OnoeOPHLAMEUNCOASTER 

UZE UAOS 

V8C S.CEREVWAi 

ECO B.COU 



AIMCPPS8A AJIVBRDU8C»L0R0fU8THEflSCtDEBM0MOPAOrCMPBaA.CGk»LETS 
AGMITCYB2 AMOUUimMTAPOCYTOC»««OMEB(COaA)08C:EXOHMOI2BP 
AMITtNA AJ«XJUW8MT0NABETWEnH2fl««BMMaJlMCTI0N».l4440OP 
ASMhrrFORN AM»UJMWTAna-TT»M.ASPTIMAANOATPASf (SUBUMTQOENES. 
ASMfTURF AJ«OUJIMWrUMOefnF»)REAONO FRAMES URFI MO URM.I«xeP 
eiVCPATP* lAflLEYC»t.0A0n>ITAT7AM»UttMnB9»EANDlTnNAO8«8:»1f8P 
aOUr BOVM MnOCHONDROI. COMPETE aBOIS.t«33MP 
CRfiCPATra CHJWyDMMNASflCIMARtTICICOROPLASTATPtaeMBCODONaCP l 

cRECPfluvp ciia*MKMCPRnao«CBisPHoepmT8CAfM)nu8fLinofi8u«iNTr 
cflccpeaAs cio*wmiiCMxmnAffTp«Mooc.DcooMatt«DpnoreNOP4i70P 

CREWnjm CKiMY(>0M0NA8WM4Afl0ai«TOCHC»OWLUMQB&O1W 
OnOMTWI OMajlfOWaTBtMrCYTOCHnOKCOXnA8ESUBUWT«,ATPME«,7TfMA 
OmUTTOt OfK»OPHLAWTOCHOI«flULCVroCHini«IANDRMTfMAae«i.«M2tP 
ORVWTYI O.VAKlAAMTCVTOCMnOMECOnOABfilANDI^ATPAS».4TnNAaBS8. 
ORVWTYTDC OMtOPWA VNCUMMrTOCMa»ttRWLTnN*OENSaU«Ta:tCA>TI*«. t«»ftP 
EORCPEPTV CUQtOMOnK«JiC»«.OnOPlASTaONaATKN FACTOR TV OENi.»MSP 

Eoncppfts i.0AACusc»c0MPUMrTPeMaDCC00ttMP0AatKMTmuiKOt>4MttP 

EORCPPSSA EU0lBMORACI.»CM.0n0PlAflTPeftA0MOC00M(»»«0HEflWC»fi 

EOACPnCL EUOL0UORACI.SCMX)nOPtASTR«UOBS-l,UISPHOePHATECAfaOXVU8C 

EORCPflP? E.ORACLBCM.0AOFUI«rfllft0MMAiPf«0TEMSt7ANDStlOB«E8.13SIOP 

HlMir HUMMMm)CHCNORICN.COMaETEO0«ME.tMMeP 

LEKPMAX LEntMWUTARBmXAEtCtCrOPUflTMAnCnCUONACaMFtETEIOntBP 

MPOCPCO UAfWHAMrUPaYM0flPm|LIVEnimimclCa(»l>USTC0Mn.CTEO»OME. 

MUSMT MOUUynOCHONDRMN. COMPLETE OaiOMCtUVMP 

MZECPATU MASS (Z-MAYS) CP COUPUNO FACTCn COMPLEX (CF-I) ISTA* ^SLON 

MZECPPSt MASECaOAOAAITPHOTOtYmMtPtlAIANOPtlAfOOfES. COMPLETE 

U»CP<«P84 MAaiC»COAOPLA«rnaOMMN.PmrTENt4 0CNB.tMtBP 

MZECPRUBP MAinCICO«IOPU«TRSaOBSaSPHOePHATECAf»OrrLA8EUWlESUBlMIT 

MZanCOXI UASBHnDCMONOmALCYTOC»«IONEC09aDA8EBUWMrri(COX|)OB«! 
MZeylTWOXt MAIZE tZMAVS) NT CVTDC»«0MEOnOA»E«JUMrTNaBC(MOXIKt7t«P 
rSUMTtOOO MCnAS8*MTCyTOCM)OMEOnDASSSUBUNrTn(COt«OOIB.1B4gP 
NCIMTCOU H.CRA8aAMrCYT0CHR0MfiOXYM86SUMMrTt0ENfi(C0aO)AN0FUM(S. 
NEWTCU1 RCIWB8AMTOCH0NDmM.OLaO»C.ATFA8SBUUJ»«T«.COMfUTECO». 
l«UMreaS3 M.CAA88ACVr-tMUrAMTMrT0CHQN0flULB«RnNA.E)C0NB1 ANOI. 
OOancVOa COKyTWUKRTERWHAMrTOCHONOnM.CVTOCMMI«OOaOA«ESUWJNrTU 
CHOCPCYTP oe<OTI<nA»4Oa(mFLAVTIOa0IEFO(IPnS.AMCVnXHra«F»T3OP 
PANtfTCOL PJMeiMMnOCHONORUL CYTOCHROME OKVOAMtUMMfTSOMtCOl), 
PAflMTCOl P.Mff«LIAEPCaGt4CVTOC»«0MEOXIWUSUBUNrTIANDHO1O«Et. 
PAAMTCVa PJUMaiACSPECtE»4)Mm)CH0NDflUiCVTOCMA0MiOaaOAaEei*UMri 
PARMTPIA PJkUnBJASPECCt4MrrOC*l0N0(WLPfKnENPtaeCOfmAN0t. 
PAflMTPlB PAURauEPGCCStMrrOCHONORULPnOTENPlOBCnaP 
PGACPCYF PCA(PMTIVlM}C»COnOPiAflTCYTOCHnOMEFamfiANDFUMtl.1>MBP 
PEACPU PEA(P.8ATM«|C»«.OROfUSTPHOTOaV9TailD9PROrENaaE.IOW 

PEMrrcou P6A(pjATmMUTcoiiaa«ENoooMacvToCHnoMiO)aoAttii3aattP 

RAIMTCYVT MT MT CYTOCHROME ■ OWE; PM>. THfV, OLU-TRNA OWES: AM) UHFI. 
RATWTCVOR RAT (aRATTUB) MT CVTDCHROME OMOMS SIWMIT n OM. TOaSP 
RATWTCVOe RAT(EPMaUE«kWLfiY)MnOCHONOflMLOeNESPOACVTOCHnOMBOXE)ASS 
RCMTCYOS nCfi(0RVZA8AT1VA)HrCVTOCHRaMfiOXKMSE8UBUMTI(COI9OBtE. 
SALCPP8I EMAPB AIM CP PRE-IW»aDQO PWUIU E y S I B i B w aEW AN B PWOTEW QO€. 
SMCPPS8A 00UWOIM0nUiCM.0A0nJOTP8aA0ENSF0n»-K0t*Pn0TEM0PtWIW 
BOnrCPPESi S0nrKANCPPMAQ8«(PHOT0evSTBinTHYLAK0nMQ«A«rCPR0TEN). 
SPCPAP 8PffMCH<S.OLBUCfiA)CMLOA0PlA8TP-tMCHLOA0PHVU.ALnA2217BP 
OPCPATBE SPMWHO.OLERACfiA) CP COUPLM FACTOR COMPLEX (ATPA8E):Mt4aP 
SPCPCYTB 8PtlACHnjWnOOBC»POAAPOCYTOCHnOMBWANDCYTOCHMME»P 
SPCPD2CE SPWACNCICOnOPLAffTPHOTOaYVTEMnPNOrTEM 09,44 KOflEACTXM- 
ftPCPPSSA tPBMCH(».OL«ACfiA>CaOf»PU«TTMnAltO«>MMlfWNSPnOTEMOe« 



AOO M)S«MRU8TYPSt 
MV HW4 

HBI HERPES SWLEXVnUB TYPE 1 

HB4 EP5TESMARRVIRUB 

IMBOA BWTERnPHAOELAMQOA 

Pr4 •ACTEIV0PHMUT4 

WT7 BACTEIMPHAOfin 

liPOCP H.PaYM0RPHAC»C0R0PlA8T0OI0ME 

TOeCP N TABACUMCICCAOfUaTOeMME 



SPCPTOH SPBIACHCICOIIOnJWTMS-TRNAAMDRPSII 
ttPCPTOI SPHACHCICOnOPUWri^TmAAIOnBOaOMALPfOTBMLtANDttt 
SnOMTCOX) •OIUHUMfinOLOnMLOtMnoCHOMMULCVTOCHMMECOXIMMtUWNrTI 
SnOMTOOM MmHUttplUTAiafCtMnOCNOfaMk CYTOCHROME COKKMBESUMMT I 
TOaCPCO KTAMCUH(VAKM0MrY(UOW 41 C»COmPiAST. COMPLETE OONMS. 
TOttCPPSl TQMO0ON.TAaACUh|T>IVLN(0«)MnMANEPnOTB<aB«<P3a)AND 
TOBCPPeftA TOaMCOf«i)EilCYtCH^Myu»TMVliWCOC>IMaiUMPMrT»iaENE 
TO CP RP tS T0»ACOOCPPUrATMin»O»0M<L-PWOTBNCSlt(HPtS>AWDRJWS.4WP 
TOttCPRUBP TO»*COOCI10WOPLAlTWM a OE B EMPHOE>W»TECM» 0» iL AIE URQESl»UNff 
TOaCPVI* WraOM ■KTOaAOOOCW.OBOmMT QOm RPSU for n ^OSCMAt PWOTBH SK. 
TOacrrOI tobacco (MOaMYQCHLOnOPUaTMS-TIMAAM) hps trOBCS.M«P 
TOK^TW TOMCCOpt DONCYqCMOIIOPIMTI^TnMA^^ 

TRVWTCYTB T JRUCEI MfTOC HONORUL UAXICMOLi DMA 0COOPIO AMCYTOCMMME i, AM) 

wHTCPATP Mi«ATCM.oiiOPL«aT«rvsinmASfipnorroM^iiuMLOCATMasu«uNrrae«. 

WHTCPATPft «MGATCM.OfWnJWrATFtVMTHErASICMStiMiNn'IOS4E.COiNKETE. 

MMTCPCYF mcATCiconopuMTCVTOcmoMEFaacntnp 

wmCPCYTB WHEAT CtCOVKMyAITOMPOR CYTOCHROME MM. W4iP 

wHTwrcot m«ATMrTocMOMiRULCYTOCMROMoaao«Mtu«tMrri(CC«)OGW. 

MMTWrCYia m«ATMrT0CH0M)RULiVOCVTOCHMMSSOB«.COMFlVnCOt.lM3BP 
X&MTCO XLAfiVa WTOCHONOMOH COMPLETE aStOMCtTUJiP 
Xajmrra XLAEV«MrTOCHaW)RML0HACONrW4MOT>ISftLO0».AM>TMEItSRRNA. 
VSCMTAPSl YEAST (S.C0tt\MM)MrrOC»O«lftULATPiaEtUWMrOB«.l4UlP 
YSCMTATtl YCAST{SjCEREWSIAE)MnOCH0M)RMLATPMEPflCirBOL»«{SUWJMTt> 
YSCMTCOSI VCAirr(SjCfiREVaiAiQHaOCHOM}nULOaatOe«aCCOMCYTOCMtOMEC 
YBGMTCOCS YEAST (S.C«V»WBI«TOCM0M)RHLCYTOC»M0MSCOa(IMMSUBUNIT 
VSCWrCY98 YSMT(SCEPCWAE)HnOCH0NDnULCYTOC»nQMESSH0(n-0e«E.aMBP 
VBCWrCYOI YEM(r(S£EREVMIAE)ISTOCHONOf«ALCYTOC»nOMBOOaOA8ESWiNrri 
YSCMTGYOa YEAST(S£CREV«UE)NnOCH0NOnUL CYTOCHROME OxnASESUKMrf 
VSCWTOUt VEA5T(S.CERCVISIAE)Wr0LhGMB<C00M0ATPASSftUSUMTf.SlSP 
VSCWTOO VEMT iS.CEREVniAC) UfTOCHONORUL OXCyOLB MTERCBTnOmC REOKM 
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